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ABSTRACT 


The  concept  of  power  for  statistical  tests  of  hypotheses, 
and  various  ideas  connected  with  it,  are  described  and  illustrated. 
The  power  is  given  for  a number  of  the  common  statistical  tests,  and 
tables  are  supplied  which  facilitate  decisions  on  the  sample  sizes 
necessary  for  detecting  differences  between  means,  variances, 
proportions  defective,  etc.  with  prescribed  power. 
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1 PRELIMINARY  IDEAS  AND  DEFINITIONS. 


1.  1 Statistical  Estimation. 

In  many  practical  problems  we  examine  a sample  in 
order  to  be  able  to  say  something  about  the  population  from 
which  the  sample  was  taken.  For  instance,  we  may  want  to 
find  out  whether  the  mean  breaking  strength  of  a large  con- 
signment of  steel  rods  is  greater  than  a specified  value,  or 
whether  a machine  is  turning  out  an  unduly  large  proportion 
of  pistons  with  diameters  outside  the  specified  tolerance 
limits.  From  a sample,  and  especially  from  a small  sample, 
we  cannot  expect  to  get  exact  information  about  the  population. 
All  that  we  can  hope  to  do  is  to  determine  the  probability  that 
a statement  which  we  make  about  the  population  is  true.  Nat- 
urally we  want  this  probability,  other  things  being  equal,  to 
be  as  large  as  possible. 

The  types  of  problems  which  we  can  investigate  statis- 
tically by  sampling,  fall  into  two  main  classes:  problems  of 
estimation  and  problems  of  the  testing  of  hypotheses.  In  prob- 
lems of  estimation  we  are  concerned  with  the  numerical  value 
of  some  characteristic  of  the  population  such  as  the  mean 
breaking  strength  (for  a population  of  rods),  and  we  form  the 
best  estimate  we  can  of  this  quantity  from  measurements  on 
a random  sample.  The  size  of  the  sample  and  the  particular 
statistic  (a  function  of  the  measurements)  which  we  use  for 
estimation  are  more  or  less  within  our  control.  As  regards 
size,  the  larger  the  sample,  of  course,  the  more  accurate  the 
estimate,  but  questions  of  time  and  expense  often  seriously 
restrict  the  size  of  a practicable  sample,  and  in  routine  work 
samples  as  small  as  four  or  five  are  quite  common.  The 
statistic,  or  estimator  as  it  is  sometimes  called,  should  poss- 
ess certain  desirable  properties;  and  in  particular  should  be 
consistent,  unbiased  and  as  efficient  as  possible.  It  is  said  to 
be  consistent  if  its  value  tends,  as  the  sample  size  increases 
indefinitely,  to  the  true  value  of  the  characteristic  which  is 
being  estimated.  It  is  unbiased  if  its  expected  value  (that  is. 
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the  arithmetic  mean  of  its  values  for  a very  large  number  of 
similar  random  samples,  all  of  the  same  size)  is  equal  to  the 
true  value.  The  efficiency  is  measured  by  the  variance  of  the 
values  of  the  statistic  for  random  samples  of  the  same  size; 
the  less  this  variance  the  more  efficient  the  statistic,  since 
the  standard  deviation  (the  square  root  of  the  variance)  is  a 
measure  of  the  order  of  magnitude  of  the  error  which  may  be 
expected  when  the  unknown  population  value  is  estimated  from 
the  known  sample  statistic. 

If  several  statistics  are  available  for  estimating  the 
same  population  characteristic,  we  shall  naturally  choose  the 
most  efficient  one,  unless  it  is  so  much  more  troublesome  or 
time-consuming  to  calculate  that  the  gain  in  efficiency  is  more 
than  offset  by  the  loss  in  speed.  Both  the  arithmetic  mean  and 
the  median  of  a sample  from  a normal  population  are  consistent 
and  unbiased  statistics  for  estimating  the  mean  of  the  population, 
but  the  arithmetic  mean  is  more  efficient  than  the  median.  The 
latter,  on  the  other  hand,  is  somewhat  easier  to  calculate.  A- 
gain,  the  variance  s2  of  a sample  of  size  N from  a normal 
population  is,  when  multiplied  by  N/(N  - 1),  an  unbiased  est- 
imate of  the  population  variance  (jr**  . Another  estimate  is 
provided  by  the  range  of  the  sample  (the  difference  between 
the  greatest  and  the  least  values  in  the  sample),  the  estimated 
variance  being  equal  to  the  square  of  the  range  multiplied  by  a 
factor  which  depends  on  the  sample  size  and  which  is  obtain*- 
able  from  tables  for  sizes  up  to  20.  For  small  sample  sizes 
the  range  is  nearly  as  efficient  as  the  sample  variance  and  is 
much  easier  to  calculate. 

1.2  Confidence  Intervals. 

When  estimating  a characteristic  of  the  population  from 
a sample  statistic  it  is  very  desirable  to  know  how  much  trust 
we  can  place  in  the  estimate.  For  many  practically  important 
kinds  of  estimate  it  is  possible  to  calculate  a confidence  inter  - 
val,  within  which,  with  a certain  degree  of  confidence,  we  can 
claim  that  the  true  value  will  lie.  If,  on  the  basis  ©f  a sample, 
we  calculate  upper  and  lower  95%  confidence  limits  a and  b. 
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lor  the  estimation  of  a parameter  0,  we  imply  that  the  probability 
of  the  truth  of  the  statement  b<  0 < a is  0.95.  This  probability 
is  to  be  interpreted  as  referring  to  the  relative  frequency  of 
correct  statements  among  a very  large  number  of  such  confidence 
statements,  each  made  on  the  basis  of  a separate  random  sample 
of  the  same  size.  Each  sample  will  give  rise  to  its  own  con- 
fidence interval,  which  may  or  may  not  actually  include  the  true 
value,  but  it  will  do  so  in  95%  of  the  samples.  We  therefore  stand 
only  a 5%  chance  of  being  wrong  in  making  the  statement  on  the 
basis  of  a single  sample. 

1.  3 Tests  of  Hypotheses. 

Instead  of  trying  to  estimate  the  precise  value  of  some 
population  characteristic  or  parameter  0,  we  may  be  more 
interested  in  whether  or  not  it  exceeds  or  falls  below  a certain 
specified  level,  or  whether  it  lies  between  definitely  specified 
limits.  We  may,  for  example,  want  to  know  whether  the  mean 
breaking  strength  of  a certain  type  of  thread  is  at  least  100  lb. 
wt. , or  whether  in  a large  lot  of  machined  parts  at  least  98% 
will  have  diameters  within,  say,  5 thousandths  of  an  inch  of  a 
given  value.  In  such  cases  we  use  random  samples  to  test  a 
certain  hypothesis,  generally  called  the  null  hypothesis,  HQ. 

For  the  example  of  the  thread,  the  null  hypothesis  might  be  that 
that  true  mean  breaking  strength  ^ 100  lb.  wt.  The  altern- 
ative hypothesis  Hj  might  then  be  that  ft*  < 100  lb.  wt.  , and 
we  try  to  decide  between  these  two  hypotheses  by  making 
measurements  on  one  or  more  samples.  In  discussing  a stat- 
istical test  it  is  well  to  be  quite  clear  at  the  outset  about  the 
nature  of  the  null  hypothesis  and  of  the  alternative  hypothesis. 

On  the  basis  of  the  sample  measurements  we  have  three 
possible  courses  of  action.  We  can  (1)  reject  the  null  hypo- 
thesis, (2)  accept  it,  or  (3)  hedge,  and  say  that  the  results 
are  indecisive  and  that  further  samples  should  be  taken.  If, 
for  any  reason,  we  are  limited  to  one  sample  of  a fixed  size, 
we  are  bound  to  adopt  one  of  the  first  two  courses.  In  two- 
sample  tests,  and  sequential  tests,  further  sampling  is  possible, 
but  most  of  the  classical  statistical  tests  are  based  on  the  fixed- 
sample  concept,  and  it  is  with  this  procedure  that  we  shall  be 
mainly  concerned. 
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1.4  Errors  of  the  First  and  Second  Kind. 

If  we  are  limited  to  acceptance  or  rejection  of  the  null 
hypothesis  we  can  obviously  go  wrong  in  two  ways,  either  by  re- 
jecting the  null  hypothesis  when  it  is  really  true  { this  is  called 
an  error  of  the  first  kind  ) or  by  accepting  the  null  hypothesis 
when  the  alternative  hypothesis  is  true  { this  is  called  an  error 
of  the  second  kind  ).  It  is  in  practice  always  possible  to  devise 
the  test  so  that  the  probability  of  an  error  of  the  first  kind  is 
definitely  less  than  some  fixed  value  less  than  1 (it  may  be,  for 
instance,  less  than  0.  05  ).  At  the  same  time,  we  should  like  the 
probability  of  an  error  of  the  second  kind  to  be  as  small  as  poss  - 
ible,  and  we  try  to  devise  the  test  accordingly.  One  test  is  said 
to  be  more  powerful  than  another  of  the  same  size  if  it  gives  a 
greater  probability  of  rejecting  the  null  hypothesis  when  it  is 
false,  or,  what  comes  to  the  same  thing,  a smaller  probability 
of  making  an  error  of  the  second  kind. 

1. 5 Assumption  of  Normality. 

A test  usually  consists  in  calculating  a certain  statistic 
T from  the  observed  sample  values  and  observing  whether  or  not 
T lies  in  a pre  -determined  region  of  values  which  is  called  the 
region  of  rejection.  If  it  does  lie  in  this  region  the  null  hypo- 
thesis  is  rejecte:d.  The  region  of  rejection  is  determined  from 
the  unknown  distribution  of  T when  the  null  hypothesis  is  true, 
and  for  most  tests  in  common  use  it  is  necessary,  in  order  to 
specify  the  region  of  rejection,  to  assume  that  the  measured 
variable  is  normally  distributed  in  the  parent  population.  There 
are  some  tests  which  do  not  require  this  or  any  other  assumption 
about  the  population  distribution  and  which  are  therefore  known 
as  dis tr ibution -f r e e tests.  Also,  some  work  has  been  done  on 
sampling  from  a rectangular  population  and  from  a skew  dis- 
tribution known  as  Pearson's  Type  111,  but  for  the  most  part 
the  assumption  of  a normal  population  is  regularly  made.  From 
a good  deal  of  experimental  evidence  it  appears  that  a moderate 
degree  of  departure  from  normality  will  not  seriously  affect  the 
ordinary  tests.  If  the  departure  is  marked,  it  is  often  possible 
by  transforming  the  variable  ( for  example,  by  using  log  x in- 
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stead  of  x,  or  by  using  Fisher1  s transformation 

z = 1/2  log  £{1  + r)/(l  - r)3  instead  of  r)  to  make  the  new  var- 
iable much  more  nearly  normal  in  distribution. 

1.  6 The  Power  of  a Test 

A null  hypothesis  regarding  the  value  of  a parameter  6 is  said 
to  be  simple  if  it  specifies  the  population  completely.  Otherwise,  it 
is  said  to  be  composite.  If  a variable  x is  normally  distributed  in 
the  population  with  known  variance  (F*but  with  unknown  mean  , 
the  hypothesis  that  =yU>0is  a simple  hypothesis,  since  the  population 
distribution  is  then  completely  determined.  The  alternative  hypo- 
thesis is  that  yU  = ft ^ where  is  different  from  />*■©  . If  we 

are  prepared  to  consider  any  possible  alternative,  we  must  allow 
either  yU.,  ^ /*©  or  ytx,  > yu-0  , and  this  is  called  a two-sided  alternative. 
Sometimes,  however,  we  are  interested  only  in  the  possibility  that 
yu,  >ytf  tf.  We  may,  for  example,  want  to  know  whether  a new  treat- 
ment or  process  will  improve  the  quality  of  a certain  material  and 
we  feel  sure  that  the  treatment  cannot  make  it  worse.  In  such  cases, 
we  have  a one-sided  alternative. 

Let  us  suppose  that  we  want  to  test  the  simple  hypothesis 
0 = 0O  against  the  simple  alternative  hypothesis  0 = 0j,  using  the 
statistic  T.  Let  us  also  suppose,  for  convenience,  that  T is  dis- 
tributed continuously  in  the  population  of  all  possible  samples  of  the 
given  size  N with  a density  functiori6  f(t  | 0O)  for  0 = 0O.  If  the  re- 
jection region  is  denoted  by  R,  the  size  of  the  test  is  given  by 


f f(t  } 0O}  dt  = <X  . 

l*> 

The  power  of  the  test  is 

P(0)  = f f(t  | 0)dt 

VO 

for  0 =£=  0O.  When  0 = 0j  , 

pC©i)  = 1 - (3  , 

where  ^ is  the  probability  of  an  error  of  the  second  kind. 
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(1.1) 


(1.2) 


(1.3) 


* - That  is,  the  probability  that  T lies  between  t and  t + dt  for  the  given 
value  0O  is  f(t  | 0O)  dt.  The  vertical  stroke  may  be  read  as  "given". 


If  the  distribution  function  of  the  statistic  is  F(t  | 0O),  which 
is  the  probability  of  a value  equal  to  or  less  than  t on  the  hypothesis 
that  0 = 0O)  we  can  write  f(t  J 0O)  dt  = dF{t  | 0O),  and  this  notation 
applies  even  when  the  statistic  takes  only  discrete  values.  The  dis- 
tribution  function  is  then  to  be  interpreted  as  meaning  the  sum  of  the 
probabilities  for  all  discrete  values  of  T equal  to  or  less  than  t. 

The  function  P{0)  is  called  the  power  function  of  the  test.  The 
ideal  test  would  be  t>ne  in  which  Of  = G and  P(0)  = 1 for  all  0 0 . 

The  curve  of  P(0)  would  resemble  that  in  Figure  1,  and  there  woufd 
be  no  errors  of  the  first  or  second  kind.  The  null  hypothesis  would 
never  be  rejected  when  true  and  always  rejected  when  false.  This 
happy  state  of  affairs  seldom,  if  ever,  arises. 


Fig.  1.  Power  Function  of  an  Ideal  Test. 

Instead  of  P(0),  the  function  (0)  = 1 - P(0)  is  often  used.  In 
this  form  it  is  called  the  operating  characteristic  (O.C. , for  short)  of 
the  test. 

In  practice,  the  power  function  of  the  test  is  more  likely  to  re- 
semble the  curves  of  Fig.  2.  If  0.  is  near  0Q,  the  power  is  small,  which 
means  that  there  is  small  probability  of  rejecting  the  hypothesis  0 = 0O 
when  0 is  really  equal  to  Oj.  However,  in  such  a case  no  great  harm  is 
done,  because  we  are  only  replacing  the  true  value  by  a nearby  one. 

When  the  distance  between  0O  and  0.  is  large,  a useful  test  will  have  a 
value  of  P{0j}  near  to  1. 
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Fig.  2.  Typical  Power  Functions  of  a Real  Test, 

corresponding  to  different  Regions  of  Rejection. 


In  general,  the  region  of  rejection  R,  for  a given  size  oc  , can 
be  chosen  in  many  ways,  each  of  which  gives  rise  to  a test  with  a 
different  power  function.  If  it  happens  that  the  test  corresponding  to 
a different  region  R1  has  a power  curve  which  lies  below  that 
corresponding  to  R for  all  values  of  6 except  0Q,  then  the  test  using 
R is  uniformly  more  powerful  than  that  using  R'.  The  probability  <X 
of  an  error  of  the  first  kind  is  the  same  for  both,  but  the  probability 
^ of  an  error  of  the  second  kind  is  always  less  for  R than  it  is  for 
R’«  That  is,  we  shall  in  the  long  run  more  frequently  go  wrong  if  we 
use  the  test  based  on  R*  to  distinguish  between  0_  and  6,  than  if  we  use 
the  test  based  on  R.  1 

If  this  is  true  for  every  possible  choice  of  R',  then  R is  said  to 
be  a uniformly  most  powerful  test,  and  if  such  a test  can  be  found  we 
shall  be  perfectly  satisfied.  Unfortunately,  however,  such  tests  are 
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seldom  available,  and  what  often  happens  is  that  R*  will  give  a power 
curve  which  is  above  that  of  R for  some  values  of  0 and  below  it  for 
other  values,  as  illustrated  in  Fig.  2. 


1.  7 The  Neyman  - Pearson  Lemma 


We  suppose  that  X is  a random  variable  which  can  take  values 
x lying  in  a certain  region  D.  If  X is  discrete,  like  the  number  of 
spots  on  the  upper  face  of  a die,  the  possible  values  of  x are  isolated 
real  numbersfe.  g.  1,  2,  3,  4,  5,  6).  If  X is  continuous,  like  a 
measured  height,  the  possible  values  of  x may  form  an  interval  or 
set  of  intervals  of  the  real  axis.  If  X is  a set  of  N observations  or 
measurements,  the  region  D is  N -dimensional.  If  R is  a sub-region 
contained  within  D,  the  probability  that  x belongs  to  R is  given  by 


TV  (x  € R) 


{1.4} 


where  F(x  | 0)  is  the  cumulative  distribution  function  for  x and  de- 
pends upon  the  value  of  a parameter  0 (or  possibly  on  several  parameters). 


If  the  Variable  X is  continuous 
dF(x  ) 0)  = f(x  | 0}  dx 

where  f{x  | 0)dx  is  the  probability  that  x lies  between  x and  x + dx 
for  the  given  value  of  0.  If  X is  discrete  dF(x  | 0)  is  the  sum  of 

<*) 

of  the  probabilities  f(x  | 0)  for  each  of  the  distinct  possible  values  of 
x which  lie  within  the  region  R,  for  the  given  value  of  0. 


Suppose  we  now  want  to  test  the  simple  null  hypothesis  0 = 0Q, 
against  the  simple  alternative  hypothesis  0 = 0j.  (It  is  understood 
that  any  other  parameters  occurring  in  the  probability  law  of  x are 
known  precisely).  Let  the  required  level  of  significance  be  Of  . Then 
the  test,  which  consists  in  rejecting  Hq  when  * C R and  accepting  H0 
in  all  other  cases,  will  be  a most  powerful  test  if  the  critical  region  R 
satisfies  the  two  conditions: 
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(i) 

i ^(* 

1 9„> 

l*> 

(ii) 

r 

1 

tR) 

} 


(1.5) 


Neyman  and  Pearson  (1933)  proved  that  if  a region  R exists 
‘which  satisfies  (i/  and  is  such  that  x belongs  to  R whenever 


£(x  | ®0}  / f(x  | Oj)  ^ c,  where  c is  some  constant, 


and  such  that  x does  not  belong  to  R whenever 
and  if  also  R*  is  any  other  region  for  which 


then 


S dF(x  | 0}  £ XdF(x  | 
(R*)  (R) 


f(*  I ®0) 

/dF(x  \ 

(R*) 


/f{*t0l)  > 

0o>^*’ 


c, 


This  means  that  the  test  using  R is  a most  powerful  one  of  size  . 
Unfortunately,  a region  with  the  stated  properties  may  not  exist. 
This  situation  is  likely  to  crop  up  when  the  distribution  is  discrete. 
An  example  where  the  region  does  exist  will  be  given  in  £ 2.  1. 


The  ratio  f(x  { 0Q)/f(x  } 0^)  is  called  the  likelihood  ratio. 

More  generally,  if  the  possible  values  of  0 form  a set  denoted  by  J\, 
and  if  the  null  hypothesis  HQ  implies  that  0 belongs  to  a certain 
sub-set  tO  of  /X  (in  symbols,  0 £ tO  },  the  likelihood  ratio 

L(x)  is  the  ratio  of  the  maximum  value  of  f(x)  under  HQ  to  the 
maximum  value  under  Hjf  i.  e. 


L«(x)  = max  f(x  | 0)  / max  f{x  | 0)  (1.6) 

Q c to  $ t si  ~u> 


If  L{x)  is  small,  the  observed  x is  much  more  likely  under  H,  than 
it  is  under  HQ,  so  that  it  would  be  unreasonable  to  maintain  HQ. 
The  likelihood  ratio  test  consists  in  rejecting  HQ  when  L{x)  < c. 


1ADC  TR  54-9 


9 


c being  a constant  so  chosen  that 

Pr  £ L(x)  < c \ H0  J * « . (1.7) 

Most  of  the  good  tests  known  are  likelihood  ratio  tests.  In 
many  practical  cases  the  statistic  X is  a set  of  N independent 
observations  forming  a random  sample.  It  has  been  proved  ( S. 

Wilks,  1938)  that  when  N is  large  the  distribution  of  - 2 log  L{x) 
is  approximately  ^n  ordinary  ^^distribution  when  HQ  is  true  and 
a non-central  % distribution  when  is  true.  The  size  and  power 
of  the  test  can  be  readily  calculated  from  tables.  Tables  of  % are 
readily  available  (e.  g.  Fisher  and  Yates,  1949  and  C.  M.  Thompson, 
1941-42),  and  Evelyn  Fix  (1949)  has  published  a table  of  non- 
central JL . 

1.8  The  Randomized  Neyman-Pearson  Lemma 

We  can  generalize  the  conditions  (1.5)  to  read 
X dF(t  l ©)  ^ & 

(R) 

for  all  0 t tJ  , and 

X dF(t  | 0)  = max. 

(R) 

for  all  0£  Xl-tU  Here  F(t  1 0)  is  the  cumulative  distribution 

function  of  the  statistic  T and  0 is  the  parameter  being  estimated. 

The  region  of  rejection  R is  chosen  to  satisfy  (1.  8)  and  (1.  9). 

According  to  (1.  8)  the  size  of  the  test  is  not  greater  than  0<  . 

A possibility  of  increasing  the  power  is  afforded  by  allowing 
randomized  decisions,  as  suggested  by  Lehmann  and  Stein,  1948. 

The  total  space  of  the  statistic  T is  divided  into  three  parts,  Rp  R£ 
and  R3.  If  T falls  into  Rj,  HQ  is  rejected;  if  in  R3,HQ  is  accepted; 
but  if  T falls  in  R2,  we  toss  a coin  or  look  up  in  a table  of  random 
numbers  to  decide  whether  to  accept  or  reject  HQ.  In  other  words, 
we  reject  Hc  with  probability  )^(T),  whatever  the  value  of  T,  ^(T) 
being  1 for  T £ Rp  0 for  T £ R3,  and  a number  between  0 and  1 

for  T £ R2.  y>(T)  is  called  a test  function,  or  sometimes  simply  a 

test. 


(1.8) 

(1.9) 
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The  randomized  Neyman-Pearson  lemma  states  that  if  L(x) 
is  defined  as  in  (1.6)  and  if  a test  function  ?{  x)  is  defined  by 


y*  (x)  = 1,  when  L(x)  < c,  ^ 

0,  when  L(x)  > c,  \ 

V'  (x>  * when  L.{x)  = c > w 


(1.10) 


then  the  test  which  consists  in  rejecting  HQ  with  probability  V'(x) 
is  most  powerful  of  size  for  testing  HQ  against  H^.  The  value  of 
c is  given  by 

Pr  { | He]  ^ Ot  , and  (x)  by 

Pr  £ L(x)  < c \ Hq  J + £{*)  Pr  { L{x)  = c | = 0<  . 

If  the  region  R-2>  where  Li(x}  = c,  contains  only  a single  value  of  x, 
(x)  is  unique > but  if  R2  includes  more  than  one  value  of  x it 
will  in  general  be  possible  to  choose  different  functions  satisfying 
(1.  11}. 


1.  9 Examples  of  Randomized  Tests 

Suppose  we  want  to  test  the  hypothesis  that  the  proportion  of 
defectives  in  a certain  manufactured  article  subject  to  inspection 
is  equal  to  or  less  than  10%,  and  that  we  want  to  do  so  on  a sample 
of  4 items  by  noting  the  number  x of  defective  articles  in  the  sample. 
Clearly,  x can  take  only  the  values  of  0,  1,  2,  3 or  4,  and  the  larger 
values  of  x will  lead  to  rejection  of  the  hypothesis. 


If  the  true  proportion  of  defective  articles  is  0,  the  probability 
of  exactly  x defectives  in  a sample  of  4 is 

p(*  l ©)  = (i)  ex  (i  - e)4~x  (i.i2) 


For  6 = 0.  i0,  this  expression  takes  for  x = 2,  3,  4,  the  values 
0.0486,  0.0036,  0^0001  respectively,  and  still  smaller  values  for 
0 < 0. 10.  The  statistic  T is  here  identical  with  x itself,  so  that  if 
-we  take  the  region  of  rejection  as  Rj(x  = 3 or  4),  we  have 
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2 p(x1  e)  ^ 0.0037 

for  all  0^0.  10.  If,  on  the  other  hand,  we  Include  also  Rz(x  s 2), 

21  p(x  | 0)  ^ 0.0523 

(R1+  R2) 

and  so  the  size  of  the  test  is  greater  than  0.  05.  The  power  of  the 
non-randomized  test  is 

p(0)  = T p(xj  6)  = 04  + 403  (1  - 0) 

(^i) 

for  0 > 0.  10.  For  0 = 0.  20,  this  is  0.  027  and  for  0 = 0.  30  it  is  0.  084. 

Now  let  us  consider  a randomized  test,  in  which,  when  x « 2, 
we  reject  H0  with  probability  , where 

Pr  { x >2  ( Ho}  + % . Pr  { x = c|  H0}  =.05,  (1.13) 

i.  e.  . 0037  + 0.  0486  ft  = 0.  05,  or  ft  = 0.  95. 

The  power  of  the  test  is 

pCe)  * Z ftx)  P(xJ  0) 

where  y*(x)  = 0 for  x = 0 or  1,  0.95  for  x = 2 and  1 for  x = 3 or  4. 

Hence  P(0)  = 04  + 403  (1  - 0}  + 0.  95  82  (1  - 0)2,  which  is  equal  to 
0.  173  for  0 = 0.  20  and  to  0.  335  for  0 = 0.  30. 

It  was  unnecessary  in  the  above  work  to  determine  the  value  of 
the  likelihood  ratio  L(x).  However,  it  is  easily  seen,  by  maximizing 
0X(1  - 0)4_x,  that  L(x)  = (10/9)4  when  x = 0 and 

= (o.  l)x(0.  9)4  / £ (x/4)x(3x/4)4-x  J when  x >0.  The  value 
for  x = 2 is  0.0144,  which  is  the  c of  (1.  11).  The  probability  that 
L(x)  = c is  the  same  as  the  probability  that  x = 2. 

A practical  method  of  rejecting  HQ  with  probability  0.95  would 
be  to  use  a table  of  random  2-digit  numbers.  Before  opening  the 
table,  decide  on  a particular  page,  a particular  column,  and  a 
particular  position  in  the  column  (say  the  7**1  from  the  top).  Then 
look  up  the  corresponding  number,  and  if  it  lies  between  0 0 and  94 
inclusive,  reject  the  hypothesis. 
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In  the  following  example,  ^£(x)  is  not  unique.  Let  the  null  hypo- 
thesis He  be  that  x is  a random  item  from  a rectangular  distribution 
of  mean  2 and  range  2,  and  let  the  alternative  hypothesis  be  that 
x is  from  a rectangular  distribution  of  mean  4 and  range  4 (see  Fig.  3). 


Mo 


1 

M, 

0 i a 3 * s-  6 ^ 

Fig.  3 Two  Rectangular  Distributions 


The  null  hypothesis  must  obviously  be  accepted  if  1 < x<2 
and  rejected  if  3 <x^6.  The  only  doubt  arises  when  x lies  between 
2 and  3 in  the  region  where  the  two  distributions  overlap. 

Clearly,  L{x)  = CO » l^x<2, 

L(x)  = 2 , 2*x  < 3, 

L(x)  = O » 3<x<6. 

The  probability  (taken  as  constant  for  x between  2 and  3) 

is  here  given  by 

fo-  <X  - Pr{  3<x<6  | H0} 

Pr  \ 2*x<3  \ Hq} 

*Ot/(l/2)  = 0.  1 

if  & * 0.  05 

The  power  ©f  the  test  is  P,  where 


KlDC  TR  54-9 


13 


P * Pr  £ 3 < x < 6 | HjJ‘1  +Pr[2<x<3  | ^ 

= 3 + j_ , 1=21 
4 4 10  40 

Another  possible  value  of  ^ would  be 
y' (x)  = (x  - 2)  /5,  2 < x < 3, 

and  this  may  easily  be  shown  to  give  the  same  size  and  power  as 
(x)  = 0.  1. 

1.  10  Similar  Tests  and  Unbiased  Tests 

A test  j^Y^i^will  be  most  powerful  for  a given  size  Of  , according 
to  the  criteria  stated  above,  if  the  following  conditions  are  fulfilled: 

Jt(*>  d •(for  9 f to  {1.14} 

J f(*)  dr  Me)  = max.  for  6 f £1  -t£>  {1.15} 

That  is  to  say,  the  probability  of  rejection  of  the  null  hypothesis 
when  this  hypothesis  is  true  (i.  e.  when  0 belongs  to  the  set  <•) } is 
never  greater  than  of  » and  the  probability  of  rejection  when  this 
hypothesis  is  false  is  as  great  as  possible. 

The  test  is  said  to  be  similar  if  strict  equality  holds  in 
equation  (1.  14).  Neyman  and  Pearson  (1928}  originally  considered 
only  similar  tests,  but  it  is  not  always  convenient  to  limit  oneself  to 
this  case,  as  it  puts  a considerable  restriction  on  the  hypotheses 
available  for  testing.  Thus,  if  Xj,  X_,  . ..,  Xn  are  independent 
normal  variates  with  mean  and  variance  (T  *,  and  we  want  to 
test  whether  yU  >0  , we  can  get  a similar  test  of  Ho  {/“«} 
against  Hj  ( Jtt  >0)  but  not  of  Hq  (^u  £ O ) against  Hj  ( /*>*  ).  The 
latter  case  is,  however,  at  least  as  worth  while  investigating  as  the 
former. 


For  this  reason,  another  criterion  of  a good  test  was 
introduced  by  Neyman  and  Pearson,  that  of  being  unbiased.  The 
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test  //(X)  is  unbiased  if  (a)  J * fix)  dF(x  \ 0)  g oi  , for  0 € u» 

(b^(x)dF{x  | e)^rf,for  * « 

That  is,  the  probability  of  rejecting  the  null  hypothesis  is  at  least  as 
great  when  the  hypothesis  is  false  as  when  it  is  true.  This  is  obv- 
iously a desirable  characteristic  of  a test.  For  a simple  hypothesis 
6*9,  the  power  curve  of  an  unbiased  test  of  this  hypothesis  has  an 
absolute  minimum  at  6 = 0Q.  If  the  curve  has  a minimum  at  0 = 0Q, 
in  the  ne ighbo rhood  of  0Q,  but  falls  below  the  value  corresponding  to 
0O  at  one  or  more  points  distant  from  0q,  the  test  is  said  to  be 
locally  unbiased. 

1.  11  Sufficient  Statistics 

Given  a set  of  random  variables  X , X , ...»  from  a 

population  which  has  a distribution  characterized  by  a parameter  0 

(which  may  be  one  of  a set  of  parameters),  we  can  use  certain 

functions  of  the  X^  (or  of  some  of  them)  to  estimate  0.  Thus,  if  0 is 

the  mean  of  the  population  we  can  estimate  it  by  the  arithmetic  mean 

of  all  the  variables,  or  by  the  median,  or  by  half  the  sum  of  the 

smallest  and  the  largest,  or  in  many  other  ways,  but  not  all  these 

ways  are  equally  good.  Some  of  them,  although  they  may  be  quick, 

fail  to  utilize  all  the  available  information.  R.  A.  Fisher  defined  A 

sufficient  statistic  as  one  containing  all  the  relevant  information  in 

the  sample.  Thus  if  the  X.  are  independent  and  come  from  a normal 

population  of  unknown  mean  ^and  known  variance  <T  , it  is  possible 

to  change  the  variables  to  a new  independent  set  Y^,  Y2,-Y^,  which 

are  linearly  related  to  the  X's  and  are  such  that  Y.  is  normal  with 

x 1 

mean  A/ and  variance  O'  , while  Y2,  Y3, ...,  Yj^  are  all  normal 
with  mean  0 and  variance  CT1.  If  the  hypothesis  which  we  wish  to 
test  relates  to  fx  , it  is  clear  that  Y^,  Y„, . . . , Yj^  contribute  no 
information,  and  all  we  need  worry  about  is  Y^.  It  is  easily  verified 
that  the  following  set  of  Y's  satisfies  the  required  conditions: 

Yl  =(Xj  + X2  + +Xn)/ 

Y2  = (Xj  - X 2 )/V£ 

y3  = (x1  + x2  - 2X3  )//T 

yn  = (X,  + x2  ♦....+  XN  . , - (N  - 1)  XN  ) //iTfJTo 
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and  Y i is  simply  the  arithmetic  mean  of  the  X's,  multiplied  by 

The  arithmetic  mean  is  therefore  a sufficient  statistic  for  estimating  JA.  , 


A more  precise  definition  is  the  following:  The  statistic  T is 
sufficient  for  estimating  0 if  for  any  other  statistic  T'  the  conditional 
probability  (or  probability  density,  in  the  case  of  a continuous  variable} 
of  T',  given  T,  is  independent  of  0.  The  probability  of  the  observed 
sample  is  then  given  by 

jb (x , e)  =f><T.  » ) - gw  , (i.i6) 

where  J?  (T,  0 ) is  the  probability  of  the  statistic  T and  g (X)  is  a 
function  of  the  sample  values  which  is  independent  of  0.  Hence,  if 
L,  = log  p , 

L = Lx(  T,  0 } + L2  , {1.  17} 

where  L is  independent  of  0,  so  that  a knowledge  of  L^  will  give 
all  the  information  about  0 obtainable  from  the  sample. 

The  condition  of  sufficiency  does  not  determine  T completely. 

Any  function  of  T is  also  sufficient.  We  naturally  choose  a function 
which  gives  a consistent  estimate  of  0,  and  if  possible  one  that  is  un- 
biased. In  problems  of  testing  hypotheses  we  can  restrict  ourselves 
to  sufficient  statistics,  because  of  the  theorem  that  if  ^ (X)  is  any 
test  and  if  T is  a sufficient  statistic,  it  is  possible  to  find  a test  (T}, 
depending  on  T only,  which  has  the  same  power  function  as  }^(X}  and 
so  is  equivalent  to  ^ (X).  This  test  is  in  fact  defined  by 

V'(f)  * J { f (*■)  I *"}  d F(x\0)  (1.18} 

where  t is  any  observed  value  of  T. 

TESTING  THE  MEAN  OF  A SAMPLE  FROM  A NORMAL  POPULATION 
OF  KNOWN  VARIANCE. 

2.  1 Simple  Hypothesis  against  Simple  Alternative  (One-sided} 

Let  Xj,  X^,  . . . , Xj^  be  a set  of  N independent  observations 
of  a variable  which  is  normally  distributed  in  the  population  with  un- 
known mean  p-  and  known  variance  CT*!  At  first  sight,  this  seems 
rather  an  unreasonable  assumption  to  make,  but  it  is  not  without 
justification  in  some  circumstances.  It  may  happen  that  conditions 
affecting  this  variable  have  changed  in  such  a way  that  the  mean  value 
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is  pushed  up  or  down  while  the  amount  of  -variation  about  the  mean 
is  substantially  the  same  as  before.  The  variance  may  then  be 
estimated  with  considerable  accuracy  from  previous  observation, 
but  the  new  mean  is  not  known. 


Let  the  null  hypothesis  HQ  be  that ^4.  = f^and  the  alternative 
hypothesis  H,  be  that  ji.  = where  we  may  suppose  , The 

test  is  therefore  of  a simple  hypothesis  against  a simple  alternative. 

The  statistic  used  to  estimate  jX  is  the  arithmetic  mean  X,  the  prob- 
ability density  of  which  is 

z>/z  -N(*  -/*0)2/  2 0-2  (2.1) 

f(*  |/^o)  = (N/2-irCT  ) e 

The  likelihood  ratio  described  in  the  Neyman-Pearson  Lemma,  §1.7 
is  therefore 

f(*  | /*.)  L 2<T  V J (2.2) 

so  that  the  condition  f{x  J /^o)  / f(x  ^ /"*«)  < c is  equivalent 
to2x.(^,+  ft0)>C(  or  £ > Ca  (2.3) 


where  c^  and  c^  are  other  constants  depending  on  c and  on  the 
known  values  of  f*0 (Pand  N.  The  size  of  the  test  is  given  by 

oo 


QL 


■fk&y 


/2  -N(x  - jXQ  )z  /2  <y2  dx. 


1/2  - 

since  R(is  here  the  interval  C2  to  00  . If  v = N ' (x  - ^0) / <T  , 


,1/2, 


and  v0  = N (c^  - f*-a  ) / O’  , equation  (2.4)  gives 

01  = 1 - $(v0>. 


at 

where  (vq)  = j ^ (v)  dv,  and  = (2.  ft) 


whence  vQ,  and  therefore  c xan  be  obtained.  Thus,  for(X=  0.05, 
vc  = 1.645.  X 


(2.4) 


(2.5) 
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//g.  f Paws*  Qmas  far  One-Sided  Normal  Tesr. 


-a+  - 0.2  o a|2  o.+ 


The  power  of  the  test  is  the  probability  of  rejecting  H when 
ft*  is  really  equal  to  , that  is,  the  probability  that  x > when 
the  probability  density  is  given  by 

*G|r>  -(-JLj/2  Tz) 

The  power  is  therefore 

r°° 

p(Z)=J  f(x  [/*,}<&=  1 (vj, 


(2.6) 


1/2  1/2  i/2 

where  = N (c^  J/O*  = v©  - {^,-^«0)N  /O'  = 1.645  - N1'  z. 


with  z = { /*»-/*«>  ) / O'  . 

Some  power  curves  are  shown  in  Fig.  4.  For  example,  if  z = 0.  3 
and  N = 9,  P(  Z ) = 1 - § (0.  745)  = 0.  228. 


2.  2.  Size  of  Sample  necessary  for  Detecting  a given  Difference 

in  the  Wean 

The  simple  example  in  § 2.  1 illustrates  how  power  functions 
may  be  used  in  designing  experiments.  If  the  true  mean  differs 
from  die  assumed  mean  t we  may  or  may  not  be  able  to  detect 
this  difference  by  using  a sample  of  size  N.  Let  us  suppose  that 
we  want  to  have  at  least  a 100  p%  chance  of  detecting  a difference 
equal  to  z times  the  standard  deviation.  Since  P(  % ) in  equation 
(2.  6)  is  the  probability  of  rejecting  the  hypothesis  that  the  mean  is 
^40when  it  is  really  , we  have 

P » 1 - 5 (l.645  - z N1/2) 

Some  values  of  N calculated  from  this  equation,  for  given  values  of 
P and  z,  are  collected  in  Table  I.  It  appears,  for  instance,  that 
to  have  a 90%  chance  of  detecting  a difference  in  the  means  equal 
to  0.  3 of  the  standard  deviation  (with  a test  which  has  only  a 5% 
chance  of  claiming  that  such  a difference  exists  when  in  fact  there 
is  no  difference  at  all)  we  need  a sample  size  of  at  least  96.  This 
assumes  that  any  difference  that  does  exist  can  only  be  an  increase, 
but  the  same  result  holds  if  we  know  that  the  difference  must  be  a 
decrease.  The  two-sided  alternative  will  be  discussed  in  the  next 
section.  In  the  first  two  linestf  Table  1,  the  sample  sizes,  being 
large,  have  been  rounded  off. 


(2.7) 
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TABLE  I 


Size  of  Sample  necessary  to  detect  with  Probability  P a One- 
sided Difference  in  the  Mean  equal  to  X O'  (Normal  Population}. 


1 N 

0.9 

0.8 

0.7 

0.6 

0.5 

0.01 

85,700 

61,900 

47, 100 

36, 100 

27, 100 

0.02 

21,410 

15,460 

11,770 

9,010 

6,770 

0.05 

3,426 

2,474 

1,883 

1,442 

1,083 

0.  1 

857 

619 

471 

361 

271 

0.2 

215 

155 

118 

91 

68 

0.  3 

! 

96 

69 

83 

41 

31 

0.4 

94 

39 

30 

23 

17 

0.5 

35 

25 

19 

15 

11 

0.6 

24 

18 

14 

11 

8 

0.7 

18 

13 

10 

8 

6 

0.8 

14 

10 

8 

6 

5 

0.9 

11 

8 

6 

5 

4 

1.0 

9 

7 

5 

4 

3 

1.5 

4 

3 

3 

2 

2 

The  probability  of  apparently  detecting  a difference  that  does  not 
exist  is  assumed  to  0.05. 
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2.  3 Simple  Hypothesis  against  Two-sided  Alternative 


Let  the  null  hypothesis  HQ  be  that  and  the  alternative 

hypothesis  that ^>^flor Let  us  also  agree  to  reject  Hq  if 
either  x *^0>c  or  - x > c,  where  c is  a fixed  value,  determined 
by  the  size  ot  of  the  test.  By  (1.  1),  we  have 


/“«-c  s° 

f f(I  |f„)dx  + j \fo  )d5  = <x  j 


- oo  f*0  +c 

where  f(x  | jA0 ) is  given  by  (2.  1). 

Due  to  the  symmetry  of  the  distribution,  both  integrals  are 
equal  to  0^/ 2 . 

2 


Putting  v = { x - )/<T  , 0(v)  = (2‘ir)"’*^  e "v 


/ 2 


-IX 


and  $ (u)  = 0(v)dv,  we  find 


oo 


CO 

J1  0(v}dv  = 1 - J(cN1/2  /CT  ) = (X/2, 


0.975,  whence  c = 1.96  (TN 


-1/2 


(2.8) 


(2.9) 


C M,/V(r 

so  that  c is  known  when  CX  has  been  fixed.  If,  for  example,  o/  = 0.  05, 
we  readily  find  from  tables  of  the  normal  law  that  (c  / O'  ) = 


The  power  of  the  test,  by  (1.2),  is  given  by 
/*o+c 


p(  r ) - 1 - J i( 


= i 


X |M  dx 

1/2 


J 


96  - N1'-  O*  -1  (yu -fo) 


96-N1/2  "1(A*  "/u«  )J 
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0*7  -0*5  -0*3  -O./  O O*/  0*3  0*S  0*7  0*9  AO 


If  z = ( l*.~  tAo  ) / CT  , the  power  function  in  terms  of  z is 


P(z)  = 1 - $ (1.  96  - N1/2  z)  + $ { - 1.  96  - N1/2z) 

This  function  is  drawn,  for  a few  selected  values  of  N,  in 
Figure  5.  The  curves  show  clearly  how,  as  the  sample  size 
increases,  the  test  approximates  more  and  more  closely  to 
the  ideal  test  pictured  in  Fig.  1. 

Figure  5 also  shows  that  for  a sample  of  9 the  test  is  not 
very  powerful  in  rejecting  a false  hypothesis.  Even  when  the 
true  population  mean  and  the  hypothetical  mean  differ  by  a whole 
standard  deviation,  the  chance  of  not  detecting  this  discrepancy 
is  about  0.  15. 

We  now  prove  that  this  test  is  identical  with  the  likeli- 
hood ratio  test. 

The  probability  of  the  observed  set  of  N sample  values, 
x , x^,  . . . , x , on  the  assumption  that  the  true  mean  is 
i^  ^ 

(2ir  O'*  ) “N^2  exp  J~  “2,  ^xi  “ /<**  )2  / 2 

i 

and  this  is  the  probability  that  is  to  be  maximized  in  (1.6). 

Since  in  this  example  (J  consists  of  the  single  value  , the 
numerator  is  simply 

(Ztt  «r"  rN/2  exp  [ - ]T  (x.  - )2/2  0>AJ 

The  denominator  is  a maximum  when  jA  = x,  so  that 
exp  [■  - J (Xi  - yu,,  )2/2  V3'] 

exp  r - 2 txi  ■ x)z/2(r  j 


(2.  10) 
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or  log  L,  * -x)2  - J (xi  - yu„  )2  1 / 2 0**  . (2.  11} 

2 

The  condition  - 2 log  L > c^  corresponds  therefore  to 

(x  - c 2 O'  /N,  so  that  x -/<0>c  0"  N or 

^ 1 

^ **1/2 

x -yu0<<»  Cj  0”  N # and.  so  is  the  same  as  the  test  given  above 

•*1  / 2 2 
with  Cj  (TN-  7 = c.  To  aay  that  - 2 log  L.  >c^  is  of  course  the 

_ _ 2 / , 

same  as  to  say  that  L.  < e 1 # A. 


Since  - 2 lag  L = N(x  - /^0  }2  /^*  , which  on  the  null  hypo- 

thesis is  the  square  of  a standard  normal  variate,  it  follows  that 
its  distribution  is  that  of  Jt'with  1 degree  of  freedom.  The  asymptotic 
approximation  mentioned  in  £ 1.  is  in  this  case  exact  for  all  N. 

2.  4 Sine  of  Sample  necessary  for  Detecting  a given  Difference  in 
the  Mean  with  Two-sided  Alternative. 


Since  P(z)  in  equation  (2.  10}  is  the  probability  of  rejecting 
the  hypothesis  that  the  mean  is  ^t0when  it  is  really  fs-  , we  must  take 
P(z}  as  given,  say  0.80.  Then 

J (1.96  - zN1/2}  - §>(  - 1.96  -z  N1/2}  = 0.2  . (2.12) 


The  same  numerical  values  occur  whether  z>0  or  < 0, 
because  of  the  symmetry  of  the  normal  curve.  It  is  clear  from 
Fig  5 that  for  z = 0.  1 the  value  of  N will  be  near  to  900  (the  ordinate 
at  z * 0.  1 and  the  abscissa  through  P(z)  = 0.8  meet  at  a point  be- 
tween the  curves  for  N = 100  and  N = 900,  but  much  nearer  to  the 
latter}.  Similarly,  for  z - 0.  3,  N is  a little  under  100,  and  for  z = 1, 
it  is  a little  under  9,  . Hence  z is  approximately  equal  to  3, 

and  $(-1.96  - z N*^}  is  therefore  practically  zero.  Since 
£ (-0.84161  = 0.2,  equation  (2.  12)  is  approximately  equivalent  to 
1.96  - z * - 0.8416,  or  N = (2.802/z)  , whence  N is  readily 

calculated  far  any  given  z.  Similarly  for  a power  0.  90,  we  should 
have  N = (3. 242/z)^  and  for  a power  0.50,  N = (1.960/z)2.  These 
approximations  may  be  checked  by  substituting  in  the  exact  equation 
(2.  12}.  We  thus  arrive  at  the  values  given  in  Table  II,  for  sample 
sizes  necessary  to  detect  a difference  of  2^* in  the  mean  with  the 
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TABLE  II 


Size  of  Sample  necessary  to  detect  with  Probability  P a 
Difference  either  way  of  % G"  in  the  Mean.  {Normal  Population). 


The  probability  of  stating  that  a difference  exists,  when  in  fact, 
it  does  not,  is  supposed  to  0.  05  throughout. 
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assigned  probabilities,  the  probability  of  falsely  rejecting  the  null 
hypothesis  being  assumed  to  be  0.05.  Since  N must  be  integral,  the 
left-hand  side  of  equation  (2.  12)  will  actually  be  ^ 0.  2,  but  N is  the 
smallest  value  for  which  this  inequality  holds. 

For  the  same  values  of  z and  P,  the  value  of  N given  by 
Table  II  is  greater  than  that  given  in  Table  I.  This  is  because  we  no 
longer  know  that  the  difference  to  be  looked  for  is  in  a given  direction. 


2.5  Simple  Hypothesis  against  Composite  Alternative 


The  problem  here  is  to  find  the  distribution  under  H , which 
may  be  different  for  different  values  of  the  parameter  9.  Suppose,  for 
instance,  that  H0  is  the  hypothesis  that  a sample  of  N observations  has 
been  drawn  from  a normal  population  of  mean  and  variance  ( T3*.  The 
alternative  hypothesis  is  that  the  mean  is  and  variance  (T*>,  where 

r ■*  1*0  ■ 

On  the  null  hypothesis  the  quantity  — /■*<>)  / CT 

is  distributed  as  %.  with  N degrees  of  freedom.  If 
probability  density  for  ^ , and  if  "X  is  defined  by 

Pr  | X >#«,,<*}  = * ’ 

an  unbiased  test  of  H , of  size  & , is  given  by  the  rule  of  reject- 

Z K-a*.)'  > . 


(2.  13) 


ing  Hq  when 


(2.  14) 


Under  the  alternative  hypothesis,  the  quantity  X - ( *■  £“/**»)  /CT* 


follows  the  non -central  X distribution,  with  N degrees  of  freedom. 
This  distribution  depends  on  a parameter 

\ = NCyU-yUo)2/  O*1 


and  the  probability  density  of  X.  is 

tt  \ i r>  - ^/2*  -x/2 

f(x)  = 1/2  e {x/2)=S2“  e 


oO 

I 

yn  = o 


m 


( X x/4) 

! T (m+l/2  N) 


Ttl 


(2.  15) 


(2.  16) 
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09 


* 

X distribution. 


Numerical  tables  of  non-central  % , giving  A for  specified 
values  of  P,  have  been  prepared  by  E.  Fix  (1949).  Curves  derived 
from  these  tables  for  OC  =0.05  are  given  in  Fig.  6.  If,  for  example, 

N * 10,  /C0  * 0,  f*'  * 0.5,  and  (T  » 1,  we  have  /t  = 2.  25.  It  is 
evident  from  the  curve  that  the  power  of  the  test  is  about  0.  13.  If  N 
were  40,  'K  would  be  10  and  the  power  would  be  0.  28. 

2.  6 Composite  Hypothesis  against  Simple  Alternative 

The  general  problem  of  testing  a composite  null  hypothesis 
Hp  { 0 £ *0  } against  a simple  alternative  H.  (-0  = 0 where  0 € 
has  been  considered  by  Lehmann  and  Stein,  1948.  * 

Let  us  denote  the  distribution  function  of  x under  by  G(x} 
and  that  under  Ho  by  F^  (x),  to  indicate  that  it  depends  on  the  parameter 
0.  The  procedure  adopted  by  Lehmann  and  Stein  consists  in  replacing 
the  composite  hypothesis  H by  a simple  hypothesis  Hq  , this  hypo- 
thesis being  that  0 has  a particular  distribution  over  CJ  , so  chosen 
that  the  problem  of  distinguishing  between  Ho'  and  is  as  difficult 
as  possible.  If  we  then  find  a most  powerful  test  for  Ho'  against  Hj 
it  turns  out  to  be,  under  certain  conditions,  also  most  powerful  for 
Hq  against  H.. 

It  is  often  possible  to  guess  the  least  favorable  distribution. 
With  the  same  assumption  as  in  § 2.  1,  let  the  null  hypothesis  be  that 
/*-  ^ /*-0and  the  alternative  hypothesis  that  /xs/u^where  /<,  > U.  It 
seems  fairly  obvious  that  it  will  be  more  difficult  to  distinguish  between 
/*■  and/*,  when  /*=/«ethan  when </V  The  least  favourable  dis- 
tribution of  /*  will  therefore  be  a concentration  at  In  other 

words,  the  distribution  function  of  /c is  a step  function  with  a single 
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When  "X»  0,  f(x)  reduces  to  the  ordinary 
The  power  of  the  test  is 

p-pri*  * O1* 

* JL  f(x)dx 


(2.  17) 
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step  of  height  1 at  ho  , so  that  the  simple  null  hypothesis  H ' is 
that  ia  = l a.0.  The  problem  is  reduced  to  that  of  § 2.  1. 


( The  most  powerful  size  of  test  of  H ' against  is  to  re- 
ject with  probability  {j/  > where 


'j'  (x)  = 1,  when  x > c, 
(x)  = 0,  when  x <*  c, 


} 


(2.  18) 


c being  determined  by 

Pr|  * > c | J = <X  . 

Now  if  Pr  t * > c I /*J  ^ OC  for  all  <:  ^*0  f the  test  for  H 
against  Hj  is  also  most  powerful  for  HQ  against  Hj.  But  this  is°tr!Ut, 
because,  as  is  evident  from  Fig.  7, 


the  area  beyond  x = c gets  smaller  and  smaller  as  fA.  moves  further  to 
the  left  from  yu.ft  . The  test  function  (2.  18)  is  therefore  most  power- 
ful for  Hq  against  H^,  and  since  it  does  not  depend  on  the  value  of  U 


29 
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— *■ 
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z<rx 


+ e 


jl<xj 


J 


it  is  a uniformly  most  powerful  test  of  the  hypothesis  LL  against 

the  composite  alternative  hypothesis  [a.  >yw-c*  ' 

Again,  let  us  suppose  that  we  want  to  test  the  composite 
hypothesis  that  either  or against  the  alternative  simple 

hypothesis  that  f+  = 0.  It  seems  intuitively  obvious  that  the  least 
favorable  distribution  of  fX.  for  the  test  would  be  given  by  a probability 
of  1/2  for  each  of  the  values  fX0  and  "fa*  On  this  assumption  we  have 

/ a/  k/z  r - iO* 

f f (%)  - ^ (irv*)  Le 

N \Vi 

f - $ (*)  * ( ^ " (2.19) 

I 

The  most  powerful  test  of  size  of  for  Hq  is  given  by  putting 
y/  (x)  = 1 when  f*  (x)  < eg  (2)  , 

y*  (x)  = 0 when  f'  (£}  > eg  (x)  , (2.  20) 

where  c is  determined  by  the  size  of  the  test,  namely 

Pr  £ {'  (x  ) < eg  (!)  | H0'}  = 0<  . (2.21) 

Now  the  first  condition  of  (2.20)  is  equivalent  to:  (x)  = 1 


when 


r «.  f*  (£  z/t  s}1 


[ 


X <T‘ 


x cr1 


j 


< C 


that  is,  when  t , x.  *. 

_ nmo  /act  r HJLJt.0 

i * [*  " 


<r 


C*  “I 


< c 


or  when 


cosh  (N  yuG  x/  <T  ) < cj, 


where  c^  is  another  constant. 


The  condition  may  therefore  be  written  \ X ( ^ ^ 
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where  depends  on  c^.  The  absolute  symbol  occurs  because  of  the 
symmetry  of  the  cosh  function,  x may  be  anywhere  between  - c 2 and 
+ C2«  Since  C£  is  merely  another  constant,  we  can  drop  the  subscript 
and  call  it  c.  This  constant  c is  then  determined  by 

Pr  { I *1  < c l H0'j=  « . 

But  since  the  distribution  function  for  is  a step  function  with  steps 
1/2  at  -yK-o  and  j-t.Q  , we  have 

l/2pr£(x|  < c | ^ = + 1/2  Pr  ^ \x  | < c | /•«-  = - 

and  this  condition  will  be  satisfied  by  choosing  c so  that 

Pr£  J x | < c j = /*0J*  = Pr  £ \ x \ < c | ^ = « . (2. 22} 

2 The  distribution  of  x is  normal  with  mean^  and  variance 

Cf  /N,  so  that  the  value  of  c is  determined  by 

$ [n1/2  (c  - /M/cr]  -${n1/2  ( - c -/^o)/0*}=  <*  . (2.23) 


Fig.  8 
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The  shaded  area  in  Fig.  8 (or  the  equal  and  symmetrically- 
situated  area  under  the  other  normal  curve)  is  equal  to  ©C  . It  is 
easy  to  see  that  if  /*>  /*.  or  if the  area,  for  a fixed  c,  will  be 
less  than  that  shown  in  the  figure,  so  that 

Pr  £ \ x\  < c|/*j  = Pr£\x^<  c | -/*}  ^ 


for  IH  » |/*4  The  test  given  is  therefore  the  most  powerful  test  of 
sizeol  for  testing  the  hypothesis  that  ^ l/^l  against  the  hypothesis 
that  ^ = 0. 

The  power  of  the  test  is  given  by 


Pr 


{ 


■*/ 


C A/ 


\ * I < c 
'V 


Ji\r 


V - 


Vi.  . 

N / c r 


(2.24) 


x <r 


The  size  of  sample  necessary  to  avoid  (with  probability  P)  claiming 
that  a difference  of  the  mean  from  zero  at  lyast  as  great  as  2 JO* 
exists,  when  in  fact  it  does  not,  is  given  by  solving  (2.  24)  for 
cN^^  /(T  and  substituing  in  (2.23)  for  a given  value  of  Of  , with  Z 
~ y-0 [61  Thus  for  (X  = 0.  05,  Z = 0.  1,  and  P = 0.  5,  we  find  that  N = 532. 
This  greater  than  the  sample  size  (385)  given  in  Table  U for  the 
same  values  of  <X  , X and  P.  The  reason  is  that  we  used  Table  XI 
to  test  the  null  hypothesis/*-  = 0 against  the  alternative  hypothesis 
IH  = % <T  , whereas  now  we  are  testing  the  null  hypothesis  IM  = 
against  the  alternative  hypothesis  ft  = 0.  In  the  first  case  the 
probability  of  wrongly  claiming  that  a difference  of  the  mean  from 
zero  exists  is  equal  to  QL  , and  the  power  is  the  probability  of  de- 
tecting a real  difference.  In  the  second  case  the  probability  of  not 
finding  a real  difference  is  equal  to  OL  , and  the  power  is  the  prob- 
ability of  stating  that  no  difference  exists  when  this  is  in  fact  true. 

If  we  want  to  be  reasonably  sure  that  we  do  not  claim  a non-existent 
effect  as  real,  but  do  not  so  much  mind  missing  a real  effect,  the 
first  arrangement  is  the  one  to  use  and  the  sample  size  is  given  by 
Table  XI.  If  we  want  to  be  reasonably  sure  of  finding  an  effect  if  it 
exists,  but  do  not  so  much  mind  claiming  a non-existent  one  as 
real,  the  second  arrangement  is  better.  Some  sample  sizes  for 
this  case  are  given  in  Table  XTI. 


As  an  example,  suppose  we  know  from  previous  experience 
that  the  width  of  a slot  in  a certain  metal  part  is  apt  to  vary,  with 
a standard  deviation  of  2 thousandths  of  an  inch.  If  we  want  to  have 
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an  even  chance  of  detecting  a real  difference  of  1 thousandth  of  an 
inch  in  the  mean,  from  an  assumed  standard  value,  with  a reasonable 
certainty  (95%)  that  we  shall  not  claim  that  this  difference  exists  when 
it  really  does  not,  we  shall  need,  according  to  Table  TI,  a sample  of 
16.  If,  on  the  other  hand,  we  want  to  be  95%  sure  of  finding  this 
difference  if  it  exists,  but  are  content  with  a 50%  chance  of  stating 
that  it  exists  when  it  really  does  not,  we  need  a sample  of  22  accord- 
ing to  Table  TI  I. 


TABLE  III 

Size  of  Sample  necessary  for  Probability  P of  not  finding  a Difference 
of  Z<T  in  the  Mean  where  none  actually  exists.  (Normal  Population). 


*\l 

0.9 

0.8 

0.7 

0.6 

0.5 

0.05 

4,  329 

3,425 

2,874 

2,465 

2,  127 

0.1 

1,083 

857 

719 

617 

532 

0.2 

271 

215 

180 

155 

133 

0.3 

121 

96 

80 

69 

60 

0.4 

68 

54 

45 

39 

34 

0.5 

44 

35 

29 

25 

22 

0.6 

31 

24 

20 

18 

15 

0.7 

21 

18 

15 

13 

11 

0.8 

17 

14 

12 

10 

9 

0.9 

14 

11 

9 

8 

7 

1.0 

11 

9 

8 

7 

6 

The  probability  of  stating  that  a difference  does  not  exist, 
when  in  fact  it  does,  is  supposed  to  be  0.  05  throughout. 
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Ill 


TESTING  THE  VARIANCE  OF  A SAMPLE  FROM  A NORMAL 

POPULATION. 


3.  1 Simple  Hypothesis  against  Simple  Alternative 

Suppose  that  the  variables  ( i = 1,  2,  . . . . , N) 
are  independent  and  normally  distributed  with  mean  0 and  variance  O'  , 
the  value  of  being  unknown.  Suppose  also  that  we  wish  to  test  the 

simple  hypothesis  Hq  (that  CT^=  CT0^)  against  the  simple  alternative 
hypothesis  H^  (that  2 = <5*^2  ). 

The  statistic  used  to  estimate  O' 2 -will  naturally  be  the 
sample  variance*  v ( = s^)  or,  to  remove  bias,  N (N  - 1).  The 
probability  density  of  v,  under  Hq,  is 


where  n * N - 1.  The  probability  density  g (v)  under  H is  the  same 
expression  with  CT  ^ substituted  fo r (5^,  2.  Therefore 

, n/2  J-\ 

1 (v)  / g (v)  = ( CT,  /(r/)  ^ T‘  ' (3.1) 

First,  we  suppose  that  <T,  ^ > <5^.  Then  the  condition 

f (v)  f g(v)  <c  is  equivalent  to  v >c,  where  the  second  c is  another 
constant.  (In  future  we  shall  often  use  the  symbol  C for  an  undetermined 
constant  which  may  vary  from  one  line  to  another.  This  saves  writing 
subscripts  such  as  c^,  c,»  etc.) 


The  test  consists,  therefore,  in  rejecting  the  hypothesis  HQ 
when  v > c,  c being  determined  by  the  condition 


Pr 


) * >‘k} 


oc 


(3.2) 


* v=x(v-xr/N  .Some  writers  use  N-l  instead  of  N in  defining  v , 
so  that  \r  is  then  an  unbiased  estimate  of  CT^. 
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Now  when  H is  true,  Nv  / (JQ ^ has  the  distribution  with  n 

degrees  of  freedom,  so  that  (3.2)  is  equivalent  to 


Pr  > Nc 


= Ot  . 


(3.  3) 


2 2 

When  (5*|  > 0*o  , the  exponent  in  (3.  1)  is  positive  so 

that  f(v)  / g(v)  increases  as  v increases.  The  condition 
f (v)  / g(v)  .<  c is  equivalent  to  v < C. 

2 2 

The  power  of  the  test  (for  <T(  > <T0  ) is  given  by 

P = Pr  = Pr{X^>Nc/CT,  A j . (3.4) 

If  ^ is  such  that  Pr^'X^  > n.j0<  } ~ **  » 

we  have  from  (3.  3)  and  (3.  4)  that 

<U*Nc/  ^ = /'a-,1  , (3.3) 

so  that  if  k = ^ 1 / , 

A 

k=  'XL,*  / (3.5) 

For  given  values  of  k and  , this  equation  may  be  solved 
for  P,  given  n,  or  for  n,  given  P.  The  former  solution  is  a 
straightforward  matter  of  interpolating  in  the  tables  of  xz  (such  as 
those  of  C.  M.  Thompson  (1941)  or  A.  Hald  (1952)).  Thus,  for  o( 

= 0.  05,  and  N = 10,  =16.9.  Then,  for  k = 2,  %l  =8.45, 

giving  P = 0. 49.  ,0t 


Figure  9 gives  power  curves  for  a few  values  of  N.  When  k ^ l, 
the  inequality  signs  are  reversed,  and 


Thus, 

$ * 


k **  X n,  1 ~oc  J 

for  0(  = 0.  05  and  N = 10, 
0-r  , V - O-  33  . 


2 


n*  1 ~ P 

2 

n,  1 


= 3.  325,  and  when 
<X 


In  order  to  find  the  size  of  sample  necessary  to  detect  a 
variation  in  the  ratio  O’ ^ / OJ,  ^ = k from  the  value  1,  or  in  other 
words,  to  have  a known  probability  P of  rejecting  the  null  hypothesis 


(3.  6) 
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4% 

(j2  * ( T0C  when  in  fact  the  true  value  of  C is  k <3^,  it  is 
necessary  to  solve  (3.5)  or  (3.6)  for  n.  This  can  be  most  readily 
done  by  means  of  the  Wilson  and  Hilferty  (1931)  approximation  for 


2 

VO  J 


, according  to  which  ( /n)A^  is  approximately  normal,  with 


mean  1 - 2/(9n)  and  variance  2/(9n). 
the  error  in'/C*  for  oi  = 0.  05  is  less  than  1 part  in 
for  J^2/(9n)  , we  see  that  ( £ * ^ /n)1'^  - 1 + 


Even  for  n as  small  as  4, 

300.  Writing  s 
z , 


o2  _ „ 

8 = S 


1 c* 


where 


f' 


0 (z)  dz  = o*  , 0 (z)  = (2tt)  e 


1/2  - z /- 


K 

and  similarly 


1/3  2 

+ s 


(t  /-)  ' -1 

<n,  P / ^ / 


V 


Therefore,  by  (3.3) 

k * (1  - s^  + sz^  ) / (1  - s2  + 8Zp), 

and  equation  which  is  to  be  solved  for  s.  If  k •<  1,  replace  z and  z 
by  zi-<y  and  zj--p  respectively.  ' 


A short  table  of  values  of  N for  different  values  of  k and  P 
is  giyen  in  Table  IV.  Fuller  tables  may  be  found  in  C.  Eisenhart, 
M.  W.  Ha  stay  and  W.  A.  Wallis  (1947),  Chapter  8. 


3.2  Different  Choice  of  Test  Statistic 


Instead  of  using  the  sample  variance  v as  the  test  statistic 
we  may  use  the  whole  set  of  sample  values.  The  probability  of  the 
observed  sample  under  HQ  is 


[>0(x)  = (2ir  (To2)  "N/2  e~^*Xi2/2 


where  x stands  for  the  set  x.,  x , ....  , *• 

1 2 


N 


WADC  TR  54-9 


37 


Under  Hj  it  is 


/v  /,  ^ 2 - N/2 

Pi  W = (2^  <T , ) 


- ^>4 

■e  i 


/ 2 <T, 


If  we  agree  to  reject  Hq  with  probability  y'(x),  where  (x)  = 1 "when 
po  (x)  / pj  { x ) < C , and  (x)  = 0 when  P»  (^)  / f'  ^ > ® , 


(3.7) 


(3.8) 


Now  ^ x.  / <J"0  is  distributed  under  H as  ^ with  N 
degrees  of  freedom,  not  N - 1,  so  that  equations  (5*.  5}  and  (3.  6)  still 
hold  with  N instead  of  -tv  . This  means  that  for  a given  N the  power 
is  slightly  greater  than  before,  or,  to  put  it  another  way,  the  size  of 
sample  for  a given  power,  as  determined  from  Table  IV,  may  be 
reduced  by  1.  The  reason  for  this  is  that  we  are  utilizing  the  known 
fact  that  the  mean  of  the  population  is  zero,  whereas  the  test  using 
the  variance  does  not  use  this  information. 


the  test  is  equivalent  to 

f (x)  = 1 if  ? x.2  > 

when  CT,  * > and 

f (x)  = i if  2 x 2 ' < c 

1 A 

when  X i ^ <T0  • 


3.  3 Composite  Hypothesis  against  Simple  Alternative 

Suppose  that  the  random  variables  are  independently  and 
normally  distributed  with  mean  jx  and  variance  <T  2,  neither  of 
which  is  known.  Let  the  null  hypothesis  Hq  be  that  (X  = <T0  , nothing 
being  said  about  the  value  of  jx.  , and  let  the  alternative  hypothesis  Hj 
be  that  = 0~ , and  jx  - jx^  . 

If  (T<  < , it  seems  reasonable  to  assume  that  the  difficulty 

of  distinguishing  between  and  would  be  as  great  as  possible  if 
we  took  JX-  - jjL(  . Any  probability  distribution  of  jx  over  values 
other  than  iAj  could  only  increase  the  variance  CQ  and  make  it  still 
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TABLE  IV 


Size  of  Sample  necessary  to  detect  with  Probability  P a Variance  Ratio 
k different  from  1. 


\ p 

0.5 

0.6 

0.7 

0.8 

0.9 

0.2 

5 

6 

7 

8 

9 

0.4 

10 

13 

15 

18 

23 

0.5 

16 

20 

24 

30 

39 

0.6 

26 

33 

41 

52 

69 

0.7 

50 

64 

80 

102 

139 

0.8 

119 

155 

198 

257 

349 

0.9 

507 

668 

866 

1,  130 

1,552 

1.1 

580 

778 

1,  016 

1,349 

1,878 

1.2 

154 

209 

275 

366 

513 

1.3 

74 

99 

134 

176 

248 

1.5 

31 

42 

55 

74 

105 

1.7 

18 

24 

33 

44 

63 

1.9 

12 

17 

22 

30 

43 

2.0 

11 

15 

19 

26 

37 

2.5 

7 

9 

12 

15 

22 

The  probability  of  falsely  rejecting  the  hypothesis  <T2  *r  CT2  is  equal 
to  0. 05  throughout. 
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different  from  CTj  . 


Figure  10.  Distributions  of  und*»r  hypotheses  Hq,  H^, 

t _ 

For  this  case,  then,  we  take  HQ  as  the  simple  hypothesis  U = y0 

and  h * r-i  (see  Fig.  10)  and  the  problem  is  reduced  to  that  of 
^ 3.  2 with  ^ (x^  - ^ )a  instead  of  ^ T*  x^f* 

The  test  is  JT  (x^  - f<  c (3.9) 

where  c is  given  by 

'X  = c / . (3.  10) 

(Vji-e* 

Now,  on  the  assumption  of  normality  in  the  parent  pop- 
ulation, it  may  be  easily  shown  that  the  probability  that  2T  ( ■<  C 

under  H is  never  greater  than  under  H7  , whatever  the  value  of 
h • It  follows,  then,  that  {3.  9)  is  a most  powerful  test  (of 
size  DC  ) of  H against  H . Since  this  test  depends  on  the  value 
of  jXy , it  is  not  uniformly  most  powerful  against  a composite 
alternative  (T  s (Tj  , with  fi.  unspecified. 

Whenff1,  >Q“djthe  considerations  given  above  do  not  apply. 

A concentrated  distribution  of  H at  u.  ~ f-*-  , would  be  easier 
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to  distinguish  from  than  any  other.  Now  it  has  been  shown  by 
Neyman  and  Pearson  that  the  test  "reject  HQ  in  favour  of  Hj  when 
Nv  > c"  , where  v is  the  sample  variance  2L(x^  - x)^  / N,  has 
the  property  of  similarity,  that  is  to  say,  the  test  has  the  same 
size  whatever  the  value  of  f*-  , and  this  particular  test  is  in  fact 
most  powerful  among  all  similar  tests  for  Hq  against  Hj.  Lehmann 
and  Stein,  1948,  applied  the  method  of  § 2.  5 to  the  case  O',  > o 
by  choosing  as  the  distribution  of  that  unique  distribution  which 
reduces  the  likelihood  ratio  test  for  Hj  against  Hj  to  the  known 
most  powerful  similar  test.  In  this  case  is  the  hypothesis 
that  O'*  = &<>  and  that  the  probability  density  f(  ft-  ) for  any 

8iven  r- is  vo -/.j* 

fir)  - 

The  test  of  Hq  against  is  then  C , c being 

given  by 

Pr  l 2 (*£  > c I ^ } ~ * • 

i 1 

This  probability,  however,  is  independent  of  the  value  of  and 
depends  only  on  <T0  and  N.  In  fact. 


ex'  r* 

Where 


C /°0 

is  the  probability  density  for 


with  N - 1 


degrees  of  freedom. 


The  test  is  therefore  most  powerful  for  H 
since  it  does  not  depend  on  yw.  , or  (T« 
powerful  test  against  all  alternatives. 


against  and 
it  is  a uniformly  most 


This  test  can  be  extended  to  cover  the  cases 

CT  < CT0 

G - CT,  ( > *o)  f A1  ~ * ; 
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(3.11) 


(3.  12) 


(3.  13) 


and 


H : G ^ &<> 

o 

hx  : <r  = cr,  ( < ^ «/*#  • 


In  both  of  these,  the  least  favourable  distribution  of  CT 
would  be  a concentration  at  (T  * 6*0  . For  the  first  case,  it 

is  clear  from  (3,  13)  that  the  size  of  the  test  £ ot  , for  <5*  £ Go 
since  o/O^will  then  be  equal  to  or  greater  than  C /< To*.  For  the 
second  case,  the  same  thing  is  true,  since  from  (3.  10) 

r‘t,r* 

o < - J (%‘)  i X1 


O 

a. 

and  the  integral  is  reduced  in  value  by  putting  CT  in  place  of 
when  CT  > <TQ 


3.  4 Power  of  the  Above  Test 

/ 

Case  1:  CT , «<£  CT0  . The  power  of  the  test  for  HQ  , and  therefore  for 
Hq,  against  Hj  is 

p*  J , 

° 'V  z 

which  is  readily  calculated  from  tables  of  the  % distribution.  Thus 
if  <X  = 0.05,  N = 5,  f*-x  = 0,  CT0  = 1 and  C T#  = 0.8,  we  find  from  (3.  10) 
that  c = 1.  145.  Then  from  (3. 15),  P = 0.  125.  This  is  slightly  greater 
than  the  value  (0.  110)  given  by  using  the  sample  variance. 


The  power  of  the  test  for  HQ  against  the  composite  alternative 
( (T  r (T,  , ^ unspecified)  is 

P = f 2 (*£  -/*')*<  c I f r] 

c being  still  given  by  (3.  10).  We  will  take  f4- , as  zero,  as  before. 

^ 2,  i 

The  quantity  2!  /^\  now  follows  a non -central  % distribution 

with  N degrees  of  freedom,  depending  on  the  parameter 

X = N ^ V V,*’ 


42 


(3,  14) 


(3.  15) 


(3. 16) 


(3. 17) 
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The  probability  density  is  as  given  in  (2.  16),  where  x = %(_  /<T,  > 

The  power  of  the  test  is 

c 

P = f f(x)dx 

a x 

Numerical  tables  of  the  non-central  X distribution,  pre- 
pared by  E.  Fix,  do  not  help  us  here,  since,  for  one  thing,  they 
refer  to  the  upper  tail  of  the  distribution,  and,  for  another,  they 
require  that  <?9  - G* % . The  power  may  be  expressed  in  the  form 

p-  -T'  r , 

° //a 

where  a = cAr^  - V , b = 2 X and  X ( <*■ , ) is  the  In- 

complete Gamma  function  tabulated  by  K.  Pearson  (1922). 

The  integrand  in  (3.  19)  vanishes  at  both  limits  and  the 
integral  may  be  approximately  evaluated  by  quadrature. 

For  the  numerical  values  mentioned  above  and  for  p = O'OtT 
0.05,  we  have  X/ \ - 0.977,  c/ffj4-  = 1.  789,  and  the  power  P is  about 
0.095. 


Case  2:  CTX  > <S0  . The  power  of  the  test  for  Hq  against  is 

P=  f h;  UL) 

«/®«a 

where  c is  given  by  (3.  13).  The  power  is  therefore  independent  of 
. Power  curves  derived  from  (3.  16)  for  ^(T0 and  from  (3.  20) 

for  6s,  >CT0  , for  a few  values  of  N,  are  given  in  Fig.  9.  If  O',  < cr0 
the  N given  in  this  figure  should  be  reduced  by  1. 

3.  5 Composite  Hypothesis  against  Composite  Alternative 

A more  general  case  of  testing  arises  when  both  the  null 
hypothesis  and  the  alternative  hypothesis  are  composite.  Symbol- 
ically, 


\ Ho  ' 

B * 

cd 

1 H1  : 

6 * 

-CL  - ^ 

WADC  TR  54-9 

43 

(3.  18) 


(3.  19) 


(3.  20) 


where  CJ  is  a specified  region  of  the  whole  space  -ft  available  for 
6.  If  ^(X)  is  a test  of  size  c*  , we  want  it  to  be  such  that 


(i)  f ^Ot)dr(xie)  <■  <v 

(ii)  J + (*.)  d F[x\$) 


for  & C u>  } 

* max,  for  & L J~X  -ui  . 


These  conditions  mean  that  (i)  the  probability  of  rejecting 
the  null  hypothesis,  if  true,  is  not  greater  than  caC  and  (ii)  the 
probability  of  rejecting  it  if  false  is  a maximum.  If  a test  that 
satisfies  (i)  also  satisfies  (ii)  for  all  values  of  6,  it  is  U.  M.  P.  , but 
usually  this  is  not  so.  Sometimes,  however,  we  can  get  a U.M.  P. 
test  if  we  restrict  ourselves  to  a particular  class  of  tests  which 
possess  some  desirable  property.  Among  such  properties  are 
those  of  invariance  and  unbiasedness.  The  meaning  of  an  unbiased 
test  has  been  discussed  in  g 1.  10.  An  invariant  test  is  one  which 
is  invariant  under  some  suitable  transformation  of  variables  in  the 
sample  space  - naturally  we  will  choose  some  simple  and  obvious 
transformation  such  as  a translation  or  a change  of  scale. 


44, 
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STUDENTS1  t -TEST 


4.  1 The  one-tailed  t-test 

In  the  examples  given  in  Chapter  Jl  the  variance  under 
the  null  hypothesis  was  supposed  known.  In  1908,  W.  S.  Gosset 
("Student")  introduced  the  now  familiar  test  for  the  mean  of  a 
normal  population,  a test  which  depends  on  the  sample  mean  arid 
the  sample  variance  but  which  is  independent  of  the  population 
variance. 


Suppose  that  a sample  of  size  N is  taken  from  a normal 
population  of  mean  and  variance  <T  X.  Let  the  null  hypothesis 
be  EL  “ 


r ■ r- 


o * 


(T  unspecified. 


Without  loss  of  generality  we  can  put  = 0 (we  merely 
nqted  to  subtract  yu0  from  all  the  observed  sample  values.)  An 


alternative  simple  hypothesis  is  that  /A- 
where  we  will  suppose  first  that  JA  , > O . 


= /J- , , and  (T  = CT 


/ > 


Under  the  null  hypothesis,  the  statistic  T = 9t 


[Nf'N'-O 


has  the  Student  t -distribution  with  N - 1 degrees  of  freedom. 
The  probability  density  is 


f(t)  = A £ | + /(N-0  J 


-"A 


where  1/A  = (N  - 1)  ^XB  [(N-l)/2,  1/2]  and  B fa,bj  is  the 
Beta  function  of  a and  b.  Therefore  f(t)  is  independent  of  0" 
The  distribution  is  symmetrical  about  0,  which  is  the  expected 


value  of  T.  If  the  observed  value  of  T exceeds 
is  so  chosen  that 


where  t 


00 

f- 


i M ** 


= <v 


and  if  we  agree  to  reject  Hq  when  T > then  we  shall  clearly 
commit  an  error  of  the  first  kind  with  probability  o(  . Since  we 
are  considering  only  alternatives  with  , >0,  we  are  using  only 
the  upper  tail  of  the  t-distribution,  and  the  test  is  one-tailed. 


‘A 


(4.  1) 


(4.2) 
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If  we  suppose  that  fXx<o  , we  shall  reject  Hq  when  - T > ^ 
and  this  is  also  a one-tailed  test,  using  the  lower  tail. 

If  the  null  hypothesis  is  Ho  :/>*•£  O (/*i>°.)the  probability 
of  error  of  the  first  kind  will  not  be  greater  than  of  . This 
follows  because  the  t-distribution  is  symmetrical  with  a maximum 
at  t = 0. 


4.2 


The  t-test  and  maximum  likelihood 


The  probability  density  for  a particular  observed  set  of 
sample  values  x^,  x^,  . ...,  x^,  on  the  null  hypothesis,  is 

p0  = ( 2 7?  a-  J c ' 


so  that 


= log  p = - loj  (2  rr  <r*)  - 2*  xl  / 2IT  4 


-c  -w  h T 


) 


where  x and  s^  are  the  sample  mean  and  sample  variance  respect- 
ively. On  the  alternative  hypothesis  the  probability  density  is  pj, 
and 

L^logp^C-NlogO-,  “ ■**  S *] 


(4.  3) 


(4.4) 


From  (4.3),  L.  (and  therefore  pQ)  is  constant  over  the  surface 
2 X-i*1  = constant.  Suppose  we  pi^ck  a region  of  rejection  on 

each  such  surface,  equal  in  area  to  a fraction  ot  of  the  area  of  the 
surface.  If  ^ - 1 when  x lies  in  this  region  and  0 otherwise,  the 
expectation  of  for  a given  value  of  2*i*will  b®  • The  whole 
region  of  rejection  R will  be  a combination  of  the  regionsfor  all 
possible  values  of  2 xi“  » **  i*  obviously  independent  of  O’  , 

W e have 


f )>.  c(*.  . . . <t*N  = <V  . 

C*) 

The  test  will  be  most  powerful  if  the  probability  of  rejection  of  Hq 

26 


(4.  5) 
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when  Hj  is  true  is  as  great  as  possible.  That  is,  we  must  choose  R 
so  as  to  maximize 


<*> 

subject  to  the  condition  (4.5).  The  solution  of  this  problem  for  >0 
' by  the  method  of  Lagrange  multipliers  (see,  for  example,  Kenney 
and  Keeping,  1951,  pp.  392-3),  leads  to  the  conclusion  that  R is 
defined  by  the  condition  t > t<*  , where  t*  is  given  by  (4.2)  and 
t has  the  distribution  (4. 1).  The  method  of  maximum  likelihood 
therefore  leads  to  the  ordinary  one-tailed  t-test,  and  since  this 
test  is  independent  of  the  particular  value  of  (*■ , chosen,  it  is 
uniformly  most  powerful  (U.  M.  P.  ) against  any  Hj  with  > o , 
whatever  the  values  of  (T  and  <T,  may  be.  Similarly,  if  /U,<o  the 
test  - t > /£■<*  is  U.  M.  P. , but  if  mart  be  greater  jot  less  than 
zero  no  U.  M.  P.  test  exists. 


4,  3 The  power  of  the  t-test 


The  power  of  the  test  is  the  probability  of  rejecting  HQ  w^en 
H is  true,  and  is  therefore  given  by  (4.4)  and  (4.  6)  with  R defined 
by  t >t  M . Now  the  probability  of  a particular  set  of  observed 

values  under  H , namely  p , dx( dxi.,  can  be  expressed  as 

f(x  , s } dx  ds  * where  f(x  , s ) is  the  joint  probability  density  for 
x and  s,  and  is  given  by 


where  * 


« e"  & + ^ 


Since  t */s  , we  can  find  P by  integrating  (4.  7) 

over  all  x such  that  x >{N  - s t^  and  over  all  s from  0 to  oo  . 

(For  any  point  in  this  region,  t > } That  is, 


P = 


OO 

KjV'* 


— w > /ZUi 


f 


i f 


<*-•) 


,/2sC 


cl*  ds 


& 


(4.6) 


(4.7) 
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On  putting  z = N t*')/***  and  1C  = / **'  .>  ^ 

so  that  z is  a standard  normal  variate,  and  % has  the  % distribution 

with  N - 1 degrees  of  freedom,  we  find 

°°  „ 

-*/*  . 
e.  <*■ 


. L-—  fx--*‘A  f «nf* 


r r(5)  „ , 

where  n = N - 1,  and  ^,=  N 
by  a change  of  variable,  as 


) ** 

This  may  be  expressed. 


O O 

v = fL(f)  [<-$(*.*-  -r.)]  dx'- 

o 

The  integral  can  be  evaluated  numerically  when  ot  , n and  (>f  are 
given,  so  that  the  power  of.  the  test  is  obtainable  as  a function  of  />*  ( 
and  <T,  . The  power  function  is  not  independent  of®*,  ; Dantzig 
(1940)  showed  that  no  test  of  the  hypothesis  can  exist  with  a 
power  function  independent  of  CT,  . In  a number  of  practical  cases, 
however,  we  do  have  some  rough  knowledge  of  <T,  , and  if  so  we 
can  use  the  tables  calculated  from  (4.  8)  for  estimating  the  size  of 
sample  necessary  to  detect  a difference  between  p-  and  o of  a 
given  order  of  magnitude. 


(4.8) 


As  an  example  of  the  use  of  this  method,  suppose  a new 
treatment  is  under  investigation,  intended  to  increase  the  strength 
of  a certain  alloy.  The  claim  that  it  does  produce  an  increase  is 
tested  on  a sample  of  N pairs,  each  pair  consisting  of  one  specimen 
of  the  alloy  having  undergone  the  new  treatment  and  one  specimen 
having  had  the  standard  treatment,  the  members  of  a pair  being 
in  other  respects  as  alike  as  possible.  The  increase  in  strength 
is  measured  for  each  pair.  We  shall  not  be  interested  in  a 
possible  decrease  in  strength,  and  so  we  shall  use  a one-tailed  test. 
The  standard  deviation  of  many  measured  strengths  under  the 
standard  treatment  may  be  taken  as  an  estimate  of  O*  , and  the 
size  of  sample  necessary  to  detect  an,  average  increase  of  strength 
equal  to  k CT  cam  be  determined  for  fixed  values  of  the  probabilities 
of  error  of  the  first  and  second  kind. 


4.4. 


Tables  of  the  Power  Function 

>/i  - , *A/  , . 

If  t is  defined  as  *v  */s  - tv  then^on  the  null 
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F/<*.  /✓.  Power  Curves  fort »«  Otve-Tktieo  P Tesrfoc*  a os 


0.2  Q.+  as  0.2 


hypothesis  Hq  (f,  = o )yt  has  the  ordinary  Student  t -distribution, 
but  when  ft-  = fx , and  <T  = <T|  , (that  is,  on  the  alternative 
hypothesis  H ) t has  what  is  known  as  the  non-central  t -distribution. 
The  probability  density  of  t is 

where  1/A  = ~ 3 [ W*  , 0-J 


and 


H(x)  = rTWj  -l 


"V  - (v  **)*/* 
a/*  av 


When  * 0 , H(0}  = 1 and  (4.9)  reduces  to  the  form  (4.  1).  The 
probability  integral  of  f(t)  is  the  power  of  the  t test. 


i.  c. 


L "*> 


44- 


which  can  be  shown  to  be  equivalent  to  (4.8). 


Tables  of  P were  calculated  by  Johnson  and  Welch  (1939). 
These  tables  are  necessarily  of  triple  entry,  since  P depends  on  n, 
ft  and  (X  . They  may  be  arranged  in  various  ways;  for  example, 
to  give  , for  selected  values  of  n,  & and  P (=1-^9),  or  to 
give  (X  for  selected  values  of  , u«  and  P.  The  former  arrange- 
ment was  preferred  by  Johnson  and  Welch. 


Suppose  that  (4  and  n are  given.  Then  tM  is  determined 
from  the  ordinary  t - table,  and  can  be  found,  after  some 
preliminary  calculations,  from  the  tables  of  Johnson  and  Welch. 
Directions  for  the  use  of  the  tables  are  given  on  p.  272  of  the 
reference  cited. 

Shorter  tables,  which  however  are  better  adapted  to  the 
particular  problem  of  the  power  of  the  one-tailed  t-test,  were 
calculated  by  Neyman  and  Tokarska  (1936).  These  give  for  d = 

0.  05  and  0.01,  and  for  all  n from  1 to  30  the  values  of  /*# 
corresponding  to  selected  values  of  (?  . The  curves  in  Fig.  11 
were  drawn  from  these  tables.  They  show  for  selected  values 
of  N the  power  P corresponding  to  k = a p 


(4.  9) 
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Suppose  for  instance  we  know  that  (T,  = 10  and  N = 17. 

If  the  chances  of  each  kind  of  error  are  equal  to  0.05,  we  find  from 
the  Neyman  and  Tokarska  tables  that  ft  = 3.44,  so  that  = N ^ p,  * 

8.34.  This  means  that  we  have  a 95%  chance  of  detecting  a real 
difference  in  the  mean  equal  to  0.83  of  the  standard  deviation,  when 
the  probability  of  apparently  detecting  such  a difference  when  none 
really  exists  is  0.05.  The  result  can  be  roughly  checked  by  Fig.  11. 


As  a rough  approximation  for  moderate -sized  samples, 

^ ® -p  \ I + o l / <%•  ”rvV 

where  Xp  is  the  standard  normal  variate  which  is  exceeded  with 
probability  P. 

With  the  data  above,  Zp=  -T.  645,  so  that  Pi  = 1.746  + 1.645  (1.095} 

= 3.  47,  which  is  not  greatly  in  error. 


Another  approximation,  for  N > 10,  is 
P = Pr  [ * < ft  - } 

where  t is  the  ordinary  central  t with  N - 1 degrees  of  freedom* 

Thus,  for  P = 0.  95,  with  N = 17,  ft  = X*  + 1.  746  = 3.49. 


4.  5 Test  of  the  Difference  of  Means  in  two  Samples 


Suppose  two  samples  of  sizes  Nj  and  N~  give  means  of 
Xj  and  X£,  the  true  population  means  being  and  and  the 

standard  deviation  GT  being  the  same  for  both  populations.  The  null 
hypothesis  HQ  is  that  -*«,  O and  the  alternative  hypothesis  Hj  is 
that  U,~ju,  - ■A.CT  (-k.  > o).The  statistic  f I . _l_  -i  - 'U 

A . L*  ■*]  j 

where  <T  is  an  estimate  of  CT  equal  to  ( IV/,  s,*  +■  /V,*,.  j/f  IV,  + 


has  the  t distribution  with  Nj  + N2  - 2 degrees  of  freedom,  on  the 
hypothesis  that  Ux  -yu,  = q,  We  reject  this  hypothesis  when  t > ^ , 
the  chance  of  error  in  so  doing  being  a . The  chance  of  error  in 
rejecting  Hq  ( that  £■&)  is  therefore  not  greater  than  ot  . 


(4.  10) 


(4.11) 
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The  quantity  corresponding  to  in  the  above  theory  is  now 

o'\ 


and  the  number  of  degrees  of  freedom 


is  n * Nj  + - 2. 


Suppose,  for  example,  that  N,  = = 10,  so  that  n = 18. 

The  value  of  p for  (X  - 0.  05  and  P = 0.  9 is  3.  04,  and  therefore 
= 1.  36.  This  indicates  that  a difference  in  the  means  of 
the  two  samples  as  great  as  1.  36  (T  would  stand  a chance  of  0.9  of 
being  detected  by  the  proposed  test. 


This  result  could  be  approximately  read  from  Fig.  11,  but 
in  using  this  figure  for  the  two -sample  problem  we  must  take  N as  „ 
n + 1 {in  this  case  19}  and  multiply  the  value  of  k by(*t+.0Va  ® 

The  value  read  from  the  figure  is  about  0.7,  which  gives 
* 1.  4. 


As  another  example,  let  us  suppose  that  we  are  interested  in 
the  difference  tof  tensile  strength  between  two  types  of  casting,  the 
variability  being  about  the  same  in  the  two  types,  and  that  we  use  a 
t-test  on  samples  of  N of  each  type.  For  a given  o<  = 0.05,  say, 
we  can  calculate  the  power  corresponding  to  an  assigned  k , that  is, 
the  chance  of  detecting  a superiority  of  the  one  type  over  the  other 
equal  in  amount  to  k (T  . We  now  have  f = k { N /a  )//Xand  n = 2N  - 2. 
Table  V,  calculated  from  Neyman  and  Tokarska's  tables,  gives  k 
for  certain  values  of  N.  It  shows,  for  instance,  that  if  we  want  to 
have  at  least  an  even  chance  of  detecting  a superiority  in  the  mean 
equal  to  one  standard  deviation,  we  should  use  samples  of  at  least 
6 or  7 each. 

4.  6 The  two-tailed  t-test 

In  many  problems  we  are  more  concerned  with  the  magnitude 
of  the  difference  between  the  true  population  mean  yU.  and  some 
assumed  value  p o than  with  its  sign.  We  may  want  to  be  reason- 
ably sure,  for  example,  that  the  mean  thickness  of  a batch  of  mica 
waBhers  does  not  differ  by  more  than  a set  amount  from  its  nominal 
value,  but  not  be  particularly  concerned  over  whether  the  washers 
happen  to  be  a little  too  thick  rather  than  too  thin.  The  null  hypo- 


■ADC  TR  54-9 


52 


thesis  is  = f^o  and  the  alternative  hypothesis  /U  = yU  , } (J~  = (j- 
where  k CT|  . The  null  hypothesis  is  rejected  when 

|i'l  >^ol^  At  being  determined  by 

A 

f f0  (*)  ~ 1 ~ , 


where  f (t)  is  the  probability  density  for  central  t. 
The  power  is  given  by 


(4.  12) 


(4.  13) 


where  n - N * )/r»  •=*  ~ 


4 V 


-n  =•  (V  - f 


The  non -central  t distribution  is  skew,  and  if  k is 
large  the  area  in  one  tail  is  negligible.  Unless  N is  very  small 
this  is  true  for  quite  moderate  values  of  k.  Thus  for  N = 10  and 
k = 0.  216,  the  probability  that 't  < — £#  (if  >f*o  } or  that  X > Xu 
(if  i*{  < /u0  } is  less  than  0.  005.  In  the  former  case,  the  power 


is  practically  equal  to 


X f,(*) 


■tic 


^rkerc  f , (-tj 


is  the  probability  density  for  non-central  t.  But  by  (4.  12), 


cx> 

/ fo  (e-)  <*■*  = «/&  , 

•c. 

so  that  we  can  use  the  tables  of  the  one-tailed  test  for  the  present 
problem,  provided  we  remember  that  when  these  tables  specify 
& = 0.05,  we  are  really  using  OC  = 0.  10. 


(4.  14) 
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TABLE  V 


Power  o£  the  t~test  for  distinguishing  between  the  Means  of  two 


nNx| 

Samples  of  size  N,  with  common  (T  . (Values  of  k such  that  P is  the 
probability  of  detecting  a difference  equal  to  k ff  , when  = 0.  05. ) 

0.2  0.4  0.5  0.6  0.8  0.9  0.95 

3 

0.78 

1.36 

1.62 

1.88 

2.48 

2.94 

3.32 

4 

0.64 

1. 11 

1.32 

1.52 

1.99 

2.35 

2.65 

5 

0.56 

0.96 

1.  14 

1.32 

1.73 

2.03 

2.29 

1 

6 

0.50 

0.86 

1.02 

1.  18 

1.54 

1.82 

2.04 

8 

0.42 

0.73 

0.86 

1.00 

1.31 

1.54 

1.73 

10 

0.37 

0.65 

0.76 

0.88 

1. 16 

1.36 

1.53 

12 

0.34 

0.59 

0.69 

0.80 

1.05 

1.23 

1.39 

14 

0.31 

0.54 

0.64 

0.74 

0.96 

1.  14 

1.28 

16 

0.29 

0.50 

0.59 

0.69 

0.90 

1.06 

1.  19 

25 

0.23 

0.40 

0.47 

0.54 

0.71 

0.84 

0.94 

50 

0.16 

0.28 

0.33 

0.38 

0.50 

0.59 

0.67 

SI 
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The  One-tailed  Test  as  an  Invariant  Test 


We  arrive  at  the  same  test  by  searching  for  the  most  power- 
ful invariant  test  of  the  hypothesis  Hq  : ^ 0 against  Hji  > 0, 

the  variance  of  the  parent  population  being  unknown.  The  mean  M 
and  the  variance  V jointly  form  a sufficient  statistic.  If  we  put 
T = M/V^1  , then  T is  invariant  for  the  class  of  transformations 
x/  = c X^,  where  c is  a positive  constant,  since  of  course  M7  = cM 
and  ‘Vr  = c2V.  (This  transformation  merely  amounts  to  a change  of 
scale  in  the  measurement  of  X). 


The  T so  defined  differs  from  the  T defined  in  § 4.  1 only 
by  not  having  the  constant  factor  (N  - 1}^'^,  and  so  is  essentially 
the  same. 


If  Z = M/ff  and  W = V/  Of  , then  on  the  hypothesis  H , , Z is 
normal  with  mean  £ = ^/(T  , and  W has  the  distribution  with 
N ••  1 degrees  of  freedom.  The  joint  probability  that  Z lies  between 
z and  z + dz  and  W between  w and  w + dw  is  therefore 

C c -"(*-»*/*  cL 

Now  T = Z/'W  ^ j so  that  for  a given  value  w of  W,  T = w Z.  If 


t ,w  }dt  dw  is  the  joint  probability  for  t and  w,  with  dt  = w 


'A 


dz. 


we  have 

w)  = C 


» V(  ^ )/z  (r/~2j/z  _ 

-e  -e 


The  probability  density  for  t,  regardless  of  the  yalue  of  w,  is  given 
by  integrating  (4.  15)  over  w from  0 to  oo  . That  is. 


lj(t)  = C 


- 'ur/x 

cL  UT 


The  null  hypothesis  is  equivalent  to  ^ £ 0,  and  the  altern- 
ative hypothesis  to  $ > 0.  If  we  choose  a particular  alternative 

> 0,  and  apply  the  method  of  § 2.  6,  the  difficulty  of  dis- 
tinguishing between  H0  and  H(  may  be  expected  to  be  as  great  as 
possible  when  = 0.  If  we  let  Hq  be  the  hypothesis  that  S = 0, 
the  most  powerful  invariant  test  of  size  for  distinguishing  between 
Hq  and  Hi  will  be 
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^(t)  = 1 -when  fQ  (t)/f  g (t)  ^ c. 


This  ratio  can  be  written 


F(t)  = [ f £(  V,  Ar)  cL* j / f ^ / e ,irf  ^v]  (4.  17) 

where  (for  t > 0)  v = w^t,  k = e~  N ^ and  ^ 4r)  « ^ /2  /2t 


and  it  is  not  difficult  to  show  that  F(t)  is  a strictly  decreasing 
function  of  t , i.e.  (F(t,)  < F^)  if  and  only  if  tf>  t^.  This  means 
that  the  test  can  be  written 


^ (t)  = 1 when  t > c, 
where  c is  now  determined  by 

Prjt>c  | Ho'J  =of,  (4.  18) 

and  so  is  given  by  the  ordinary  table  of  t. 

The  same  conclusion  holds  for  t < 0,  v in  (4.  17)  being  now 

equal  to  - w^ t,  and  e^  v replaced  by  e V . 

Now,  from  the  shape  of  the  t distribution,  it  follows  that 
the  probability  that  t > c,  under  HQ  : p £ 0,  is  never  greater  than 
it  is  under  Hj  : = 0 (compare  Fig.  7,  which  is  drawn  for  the 

normal  curve  with  mean  }*-a . The  general  shape  of  the  t -distribution 
resembles  this  curve).  That  is, 

Pr  ( t > c | Ho  ] £ * ■ 

It  follows  that  the  test  is  most  powerful  for  distinguishing  Hj  from  H0. 

Since  it  is  independent  of  the  particular  value  S,  chosen,  it  is  U.  M.  P. 
among  invariant  tests  for  distinguishing  Hq  ( f*  i 0)  from  H(  (/a  >0). 

4.  8 The  Two-tailed  Test  as  Invariant  Test 

Here  we  want  to  distinguish  between  HQ  ( /*•  = 0)  and  H (/w^O), 
or,  in  terms  of  the  quantity  S'  = /*  J CT  introduced  in  §4.7,  between 
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0 x 0 and  d 0.  If  we  again  use  the  Lehmer  and  Stein  method, 
of  replacing  H by  a "most  difficult"  simple  hypothesis  Hj  , it 
would  seem  reasonable  to  take  for  the  hypothesis  that  £"  is 
equally  like  to  be  , where  is  a fixed  number  greater  than 

0.  Instead  of  {4.  16)  we  have 


* 


and  instead  of  (4.  17) 

F(t)  = ["  f i tj  d*  J/pt  J'  f fa  *~J  cosA 


(4.  19) 


(4.  20) 


It  can  be  shown  that  F{t)  decreases  as  t increases  for  t ^ 0 and 
decreases  as  t decreases  when  t < 0.  The  test  ^ (t)  = 1 when 
F(t)  < c is  therefore  equivalent  to  ^ (t)  = 1 when  \ t ) > c,  and 
this  is  the  ordinary  two-sided  Student  test,  c being  given  by 


Pr{  \«l>c  1 Hj=  c* 

Since  it  is  independent  of  S\  this  test  is  U.  M,  P.  for  HQ  against 
the  composite  alternative  (namely,  J S(  > 0). 
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TESTS  FOR  THE  PROPORTION  DEFECTIVE 


5.  1 Simple  Hypothesis  against  Simple  Alternative 


Let  p be  the  observed  proportion  of  items  in  a sample  of  N 
which  have  a certain  characteristic.  For  convenience  we  shall 
refer  to  this  characteristic  as  that  of  being  "defective"  but  of 
course  it  may  be  of  quite  a different  nature.  It  is  merely  nec- 
essary that  an  inspector  shall  be  able  to  say  unambiguously  of 
each  item  whether  or  not  it  possesses  the  characteristic  in  ques- 
tion (which  might  for  instance  be  that  of  being  of  the  male  sex  if 
the  sample  consists  of  animals). 

Ijir  is  the  true  proportion  defective  in  the  population  (assumed 
to  be  very  large)  and  X is  the  observed  number  defective  out  of  a 
sample  of  N (X  is  a random  variable),  then 


Pr  [ X = x] 

where  / = N!  / Tx!  (N  - x)! J 

^ V * 


-X 


(5.1) 


Let  the  null  hypothesis  Hq  be  that  TT  = Tro 


ative  hypothesis  H^  that  ir  = tt^. 


and  the  altern- 

Then  in  the  notation  of  £ 1.  *T,7 

( 'n°'  ( v S " = f <*> , 

f MM  = (Nx)  IT,  * TT,)"'*  - 

and  L(x)  = x log  (7Te /it, ) + ( /V-  * ) |oy  £ ( I- 7T„)/(  f- TV,)}  • 


The  condition  for  rejection,  L(x)  «C  c,  is  therefore 

*r  i°$  + ic*  ] + n h < c 

First  let  us  suppose  that  u > T Tq,  so  that  1 - tt,  < 1 - 
The  coefficient  of  x in  (5.2)  is  negative,  and  (5.2)  is  equivalent 
to  x > c 


TT 


(5.2) 


(5.3) 
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where  c is  a new  constant.  The  value  of  c is  determined  by  the 
size  of  the  test,  i.e.  by 

* 

2,  +(*)  f(*)  £ <*  . 

X T0 

As  described  in  § 1.8,  we  can  choose  a suitable  integer  c, 
and  a suitable  probability  <f/Q  , so  that  for  the  test 

I*/'  (x)  = 1.  x > c, 

^ (x)  = 0,  x < c, 

V*  (x)  = x = c» 

(which  means  that  we  reject  H if  x > c,  and  reject  it  with  prob- 
ability 0 when  x = c } we  have 

f (c)  % + S = * • 

C +1 

The  power  of  the  test  is 

p*  %( 0-  Vi  + 2.,  3 • 

For  given  values  of  o<  , TT0  and  N ( 100)  we  can  use  the  Tables  of 

the  Binomial  Probability  Distribution  (National  Bureau  of  Standards, 
1950,  and  H.  G.  Romig,  1953)  to  find  P as  a function  of  ir  j,  and 
hence  construct  power  curves  for  this  test.  For  large  N and  tto 
near  0.  5 the  distribution  of  x under  hypothesis  Hq  is  approximately 
normal  with  mean  Ntto  and  variance  Ntto  ( 1 - tto)  . For  large  N 
and  tTq  near  0 the  distribution  is  approximately  of  the  Poisson  type, 
with  parameter  Ntt0. 


If  TT  < 7T  , the  test  (5.  2)  is  equivalent  to  x < c,  where  c is 
determined  Dy 

c -i 


f(c)  4*0  + 


— Oi 


which  is  equivalent  to 


2 - f ^ “ / * * 


(5.4) 


(5.5) 

(5.6) 


(5.7) 

(5.8) 
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Eii h /g.  Pow£R  Curves  th»  7£srof  fl*opo*r/oN  Dencr/v* 


(5.9) 


The  power  is  now  given  by 

1 - P = H %(*■)  ~ fr(c)  fo 

Figure  12  gives  the  symmetrical  power  curves  with  iro  = 0.  5 

for  several  values  of  N,  and  Figure  13  illustrates  the  non-symmetric- 

al  case  for  tt  = 0.  2.  The  data  for  these  curves  and  for  some  other 
o 

values  of  IT  are  included  in  Table  VI. 
o 

It  is  apparent  from  Figure  12  that  to  have  at  least  a 50% 
chance  of  detecting  the  difference  between  an  actual  proportion  0.  7 
in  the  population  and  an  assumed  value  Q.  5,  one  would  need  a sample 
of  about  16.  To  raise  this  chance  to  80%  one  would  need  a sample 
of  nearly  40.  This  is  on  the  assumption  that  the  chance  of  falsely 
stating  that  such  a difference  from  0.5  exists  is  only  0.05. 

5. 2 Proportion  Defective  when  Distribution  is  Normal 

We  suppose  that  the  objects  tested  have  a characteristic  X 
which  is  normally  distributed  with  mean  yn-  and  standard  deviation 
<y  , and  that  if  X is  greater  than  some  fixed  value  xq,  the  object 
is  classed  as  defective.  We  might  think  of  bolts,  for  instance, 
which  must  be  less  than  a certain  diameter  to  pass  through  a hole 
of  fixed  size.  The  proportion  it  of  defectives  in  the  population 
will  be  the  area  under  a standard  normal  curve  beyond  the  ordinate 
at(x  - ) / <T  . If  x and  s are  the  mean  and  the  standard 

deviation  of  X in  a sample  of  N items  from  the  population,  an 
estimate  of  ( x - yn-  ) / <T  . which  we  will  call  p , is 

provided  by  the  statistic  u = n ( x - x)  / s,  n being  written 
for  N - 1.  The  probability  p that  a standard  normal  variate  exceeds 
N-'V  i.«.  1 - $ (N”'^«a.),  is  then  an  estimate  of  TT. 

No,u  = n1/2(  -igfi)  / A 

-ni/2  { f -*)/*„. 

1 1 2 

where  z = (x  - )/ T and  = N s / G"  . 


(5.  10) 
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(7T,  « aaj 


TABLE  VI 


Power  of  Teat  for  Proportion  Defective 

This  gives  for  assigned  ir^,  and  OC  * 0,05,  the  power  of  the  randomized 
test  for  the  proportion  TTj  of  defectives  in  the  population  against  an 
assumed  value  tr  , for  a sample  of  size  N.  The  test  consists  in  reject- 
ing Hq  if  the  observed  number  x of  defectives  is  greater  than  c or  less 
than  c;  . If  x = c,  H is  rejected  with  probability  , and  if  x * c* 
with  probability  . ° 

(a)  tto  =0.2 
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6$ 


(b)  TT  D -0.3 


6* 
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(c)  iro  = 0.  4 
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(d)  ir  * 0.  5 
v ' o 


N 

c 

f 

c 

t. 

*1 

0.  1 

0.2 

0.4 

0 5 

0.9 

0.8 

0.6 

V«  J 

5 

4 

1 

0. 120 

.630 

. 377 

.211 

. 109 

.05 

10 

8 

2 

0.893 

.909 

.646 

. 358 

. 154 

.05 

15 

11 

4 

0.778 

.978 

.794 

.467 

. 189 

.05 

20 

14 

6 

0.793 

.996 

.8?1 

.568 

.224 

.05 

30 

19 

11 

0.0124 

1.000 

.975 

.732 

.293 

.05 

40 

25 

15 

0.264 

1.000 

.993 

.828 

. 350 

. 05 

For  values  of  ir  >0.  5,  use  the  above  tables  with  1 — tto  and 
1 • ny  instead  oft  iro  and  ir ^ • The  values  of  c and  c1  will  be 
N - o'*  and  N^-  c of  the  above  tables  respectively,  and  the  values 
of  ^ and  will  be  interchanged.  Thus  for  irQ  = 0.  6 and  N = 40, 
c = 29  and  c'  = 19,  % = 0.414  and  yf/*  0.308. 


WADC  T1  54-9 


66 
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In  this  expression,  z is  a standard  normal  variate  and  JL ^ 
has  the  ')J‘  distribution  with  n degrees  of  freedom;  u,  therefore,  has 
the  non-central  t distribution  of  (4.  9)>  with  £ instead  of  f • . In 
equation  (4.  9),  t was  defined  as  n1/2  (z  + p^  ) / % , but  we  can 
easily  see  that  if  the  signs  of  both  t and  p(  are  changed,  f(t)  is  un- 
altered, and  we  note  that  the  double  change  brings  us  to  the 
definition  of  (5.  10). 

If  p is  known,  the  value  of  u (say  u^  ) such  that  Pr  ^ u > J 
= P is  given  by 


U*  = t«  , 

where  tw  depends  on  n,  p and  P,  and  can  be  found  from  the  tables  of 
non-central  t.  Hence,  if  we  find  u for  a sample  of  N items  from  the 
population,  we  can  fix  confidence  limits  for  p by  supposing  that  p is 
such  that  u = uw  . We  read  from  the  tables  the  value  of  p correspond- 
ing to  the  given  n,  t(=u)  and  P,  and  our  upper  confidence  limit  for  tt 
(supposing  that  P < 0.5)  is  1 - p). 

The  lower  confidence  limit  is  found  similarly  by  replacing  P 
by  1 - P.  The  confidence  coefficient  corresponding  to  these  limits 
is  1 - 2P. 

Suppose  for  example  that  we  want  90%  confidence  limits  for 
the  proportion  of  defectives  in  the  population,  as  determined  from  a 
sample  of  size  50.  We  will  agree  to  regard  an  object  as  defective  if 
the  value  of  X for  this  object  exceeds  1.645.  From  the  sample  we 
find,  say,  x = 0.  14,  s = 0.  90,  so  that  u = 7(1.  505)/  0.  90  = 11.  7. 

Putting  P = 0.05,  t = 11.  7,  n = 49>  we  find  from  the  tables  of 
Johnson  and  Welch  that  p = 9.  12,  so  that  N~  ' p = 1.  29  and  the  upper 
confidence  limit  for  ir  is  0.  099.  Putting  P = 0.  95,  we  get  N“*^2  p = 
2.01,  so  that  the  lower  confidence  limit  is  . 022.  The  estimate  of  ir 
given  by  1 - $ (N  -1/2  u)  is  0.050. 

Instead  of  being  given  xq,  we  may  ask  what  value  it  should 
have  in  order  to  correspond  to  a given  value  of  it.  By  our  assumptions, 

■ i - f (e  , 

67 


(5.11) 


(5.  12) 
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so  that  x„  a ul  + <r  z_,  where  is  the  standard  normal  variate 

or  it 

exceeded  with  probability  it,  and  is  equal  to  ^N"1'  c.  An  estimate 
of  xQ  is  therefore  given  by 

« * * * •=  51  + s n.  p (5.  13) 

. A 

In  the  above  example,  if  ir  = 0.  05,  = 1.  645,  and  xQ  = 0.  14  + 

V* 

0^°(^j{1*645)  = 1‘635‘ 

We  can  again  use  the  non -central  t distribution  to  find  confidence 
limits  for  xQ  . 

If  u > u^  , n1^2  (xQ  - x)  > Su^  , 
so  that  x„>x  + S'  u’tU  . 

If  therefore  we  find  t^  corresponding  to  the  given  values  of  n,  p and  P 
we  can  put  = t^  and  calculate  xq.  By  taking  P = 0.  05  and  0.  95 
we  get  upper  and  lower  confidence  limits  for  xq. 

1 /2 

Thus,  in  the  same  example  as  before,  £ = 1.  645  (50)  = 11.63, 

n = 49  and  P = 0.05.  From  the  tables  we  can  find  tw  = 14.  60,  so 
that  x > 0.  14  + Q.  90  X 14.  60  = 2.02.  If  P = 0.  95,  we  get  t = 9. 40, 

~T~ 

so  that  xQ  > 1.21. 

The  90%  confidence  limits  are  therefore  1.21  and  2.02. 


KADC  TR  54-9 


68 


VI 


THE  F-TEST 


6.  1 Comparison  of  Variance  for  two  Normal  Populations 

x a. 

Suppose  we  wish  to  compare  the  variances  <y,  and  <TX 
for  two  populations,  known  to  be  normally  distributed.  Let  the  null 
hypothesis  H be  that  (T,  = and  the  alternative  hypothesis  Hj  that 
<T|  = v&ere  we  may  take  X > I . The  usual  test  is  to  compute 

the  function: 

h/,  s’, 1 N2  - I 

F = ■■  • — j 

Nx  5aa  N,  - • 

a 

which  is  a ratio  of  an  unbiased  estimate  of  0*,  , given  by  the  sample 
variance  s^,  , to  the  corresponding  estimate  of  <7^*,  and  N,, 

being  the  sizes  of  the  two  samples. 


The  distribution  of  F on  the  null  hypothesis  is  known.  Its 
probability  density  is  ^ t\,-z 

Hf)  , (”>/*>)'*  E — 1 


3 ( I +•  ^,py>la) 


where  n^  = Nj  - 1 and  n^  = - 1. 

The  hypothesis  H is  rejected  if  F > , where 


C f(F)  dF  = tv 

The  probability  of  rejecting  HQ  when  it  is  really  true  is  then  equal 
to  <y  . 


The  power  of  the  test  is  given  by 

P = Pr£F  >Fol  |H,j 

F O’,2  N*  s> 1 / N*  SL 

Now,  on  the  hypothesis  Hj,  the  ratio  j 

has  the  F distribution,  so  that  if  Fp  is  the  value  of  F which  there 
is  a probability  P of  exceeding. 


pr  £ <r/F/<r,*  > FPJ  = r, 
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(6.5) 


i.<=.  Pr  [ F > <T,‘  f>  / VS  } = T. 

Comparing  (6.  4)  and  (6.  5)  we  see  that 

f,  = <r,*  rF  /<r/  = X*  FP  . 


(6.6) 


We  can  use  the  tables  of  the  F distribution  calculated  by  Merrington 
and  Thompson  (1943)  which  are  available  in  abridged  form  in  Hald's 
Statistical  Tables  and  Formulas,  1952,  in  order  to  calculate  X 
Thus,  if  N = N = 10,  <*  = . 05.  and  P = 0.  5,  we  find  F^  = 3.  18, 

Fp  = 1.00,  so  that  A = (3.  18)1'  = 1.78. 

This  means  that  we  stand  an  even  chance  of  recognizing  a 
difference  between  G,  and  {Tj.  when  the  actual  ratio  is  1.  78,  provided 
we  agree  to  accept  a 5%  chance  of  wrongly  rejecting  the  null  hypothesis 
when  actually  <T,  = (Tx  . Only  for  a few  values  of  P can  Fp  be 
obtained  directly  from  the  tables.  When  and  are  reasonably 
large  (say  30  or  more)  an  approximation  devised  by  A.  H.  Carter 
(1947)  may  be  used  to  obtain  F for  other  values.  This  approximation 
is  actually  for  Fisher's  % , which  is  more  nearly  normal  than  F, 
but  since  z = 1/2  loge  F,  F is  readily  obtained  from  a table  of 
natural  logarithms.  The  approximation  consists  in  finding  f , the 
normal  variate  corresponding  to  P,  and  calculating 

% * (*.7) 

where  s = 1 + 1 , h = 2 , k = ( 'C*'-  3 . 

=1  =2  r 


For  nj  = n_  = 19,  and  P = 0.25,  we  have  'f  = 0.6745,  k =-0.4242, 
h = 19,  ana  therefore  z ~0.  1530.  This  is  equivalent  to  F = 1.  3580, 
whereas  the  correct  value  is  1.369.  In  Table  VII  the  entries  for  P = 
0.4  and  0.6  have  been  calculated  by  means  of  this  approximation, 
and  checked  by  a set  of  curves  drawn  by  Ferris,  Grubbs  and  Weaver 
(1946). 


Table  VII  gives  values  of  X for  different  sized  samples 
(the  two  samples  are  supposed  equal  in  size,  so  that  Nj  = ^ = N), 
and  for  certain  values  of  P.  This  table  can  be  used  to  decide  roughly 
what  size  of  samples  would  be  necessary  in  order  to  detect  a given 
difference  of  the  value  of  "X  from  unity. 
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TABLE  VII 


Values  of  the  Standard  Deviation  Ratio  JC  = <T,  / <r 1 , detectable 

with  power  P,  the  samples  being  both  of  size  N. 


0.  1 

0.25 

0.4 

0.5 

0.6 

0.75 

0.9 

0.  95 

5 

1.25 

1.76 

2.24 

2.53 

2.86 

3.63 

5. 12 

6.39 

10 

1.  14 

1.41 

1.  64 

1.78 

1.  94 

2.25 

2.78 

3.  18 

15 

1.  11 

1.  31 

1.47 

1.58 

1.68 

1*  89 

2.24 

2.48 

20 

1.09 

1.26 

1.39 

1.47 

1.56 

1.72 

1.99 

2.17 

25 

1.08 

1.22 

1.34 

1.41 

1.48 

1.62 

1.84 

1.98 

30 

1.07 

1.20 

1.30 

1.36 

1.43 

1.55 

1.74 

1.86 

61 

1.05 

1.  13 

1.20 

1.24 

1.28 

1.35 

1.46 

1.53 

101 

1.04 

1.  10 

1.  15 

1. 18 

1.21 

1.26 

1.34 

1.39 

The  probability  of  error  of  the  first  kind  is  0.05; 
that  of  error  of  the  second  kind  is  1 - P. 


WADC  TR  54-9 


71 


Thus,  suppose  that  a suggested  new  proces  of  manufacture 
is  expected  to  be  able  to  reduce  the  dispersion  in  some  quantity  meas- 
ured (e.g.  tensile  strength)  for  certain  types  of  casting.  If  the  new 
process  can  reduce  dispersion  in  the  ratio  3 : 2,  it  will  be  worth  while 
changing.  In  order  to  have  a 75%  chance  of  detecting  such  a difference, 
if  it  exists,  we  need  samples  of  about  35.  In  order  to  have  a 95% 
chance,  we  should  need  samples  of  about  70. 

6.  2 The  Analysis  of  Variance  Test 

The  standard  analysis  of  variance  test  is  an  F test  of  the 
hypothesis  that  there  is  no  significant  difference  between  the  "treatments" 
being  compared.  The  test  is  based  on  a comparison  of  two  independent 
estimates  of  variance,  one  calculated  from  the  treatment  effects  and 
one  from  the  "error". 

To  take  a simple  case,  suppose  we  have  b treatments,  each 
rq>licated  r times.  If  X is  the  variable  measured,  and  if  x-  is  the 
observed  value  for  the  i^1  treatment  and  the  j**1  replicate,  tlie  total  sum 
of  squares  is 

Q = 2T  (xii  - * )2  * <1,  + * 

2 1 L 2 

where  qL=r  J (*i#  - x)2  and  q2  = J (x^  - ) 

*■  * V 

Here  x,  = 1 V"  x, .,  the  mean  value  for  the  i**1  treatment,  and  x is  the 
1*  — lj 

r J 

overall  mean.  The  quantities  q^  and  q2  are  called  the  sum  of  squares 
between  treatments  and  the  sum  of  squares  within  treatments  respectively. 
The  latter  depends  only  on  the  minor  variations  between  replicates 
undergoing  the  same  treatment,  and  the  corresponding  mean  square 
q£^(r  - 1)  b is  the  error  estimate  of  variance.  The  former  depends 
on  the  average  treatment  effects,  and  the  mean  square  q^/(b  - 1)  is 
the  estimate  of  variance  based  on  treatment  differences. 

On  the  null  hypothesis,  ql . b(r  - 1)  = F has  the 

b - 1 q2 

F distribution  with  n,  and  n degrees  of  freedom,  where  n,  = b - 1 and 
n2  = b(r  - 1).  Since 
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ft.  } 


V*. 


**•.  n. 


* 4 la 


r>,F  + '*'l«.  -n, 

This  quantity,  denoted  by  E , has  been  used  in  the  tables 
prepared  by  P.  C.  Tang  (1938)  for  the  power  function  of  the  analysis 
of  variance  test. 

tli 

On  the  alternative  hypothesis  that  the  true  effect  of  the  i 
treatment  is  ot  C (where  we  may  suppose  the  origin  so  selected  that 
^ • * 0),  the  variance  of  treatment  effects  is 

< -i  Z 

b » 

x 

If  the  true  variance  of  the  population  sampled  is  CT  (irrespective 
of  the  magnitude  of  the  treatment  effects),  the  variance  of  a treatment 
mean  is  = (T^r.  The  ratio 


9- 


. IT, 


/G\, 


K (vX'A ), 


is  used  in  Tang's  tables  as  an  argu^ment.  When  (f  = 0 the  quantity 

has  the  ordinary  * distribution  with  n^  degrees  of  freedom,, 
When  fl  0,  it  has  a non -central  %zdistribution,  the  probability  density 

beln*  „ .*lA 

f(X')  = i eTx  (-§*)  1 

where  p->  is  written  for  q^ /CT  , and  a for  b ^ , and  where  K(x)  is 

an  infinite  series: 

oft 

K(x)  = V x™ 

m=0  m ! r^(m  + (b  -ij-/a ; A 

2 >c  , Cx+) 


=<W). 


I + 


l!  *){&+') 

When  \ = 0,  K ( Xa'1/*)  in  (6.  10)  reduces  to  K(0)  = -—^7. 

' C 


and 


* 

(6.  10)  is  then  the  ordinary  X-  density  function  with  b - 1 degrees  of 
freedom. 


(6.8) 


(6.9) 


(6.  10) 


(6.  11) 
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The  density  function  for  E , •which  is  obtainable  from  the 
non»central  density  for  q./o*1  and  the  central  7 (■  density  for 


f(E2}  = - 

where 

H(x)  » 1 + 


«-VV) 

T'rv<  - ' 

(< 

3 

Cin<  , 

4*0 

<>»i  +■ 

^ -L- 

(v\  t +**.)( 

) ! 

(r\ 

H(XeVi) 


x ! 


and  where  n = b - 1,  n * br  - b,  n^  + n * br  • 1,  The  function 

H(x)  is  called  a confluent  hyper  geometric  function.  When  \ * 0, 

that  is  when  f * 0,  we  have  H(;j  X E2)  = and  E^  is  a Beta -variate. 

The  probability  of  error  of  the  first  kind,  if  the  null  hypothesis  is 

rejected  when  E^  > E*  , is  given  by 
1 

cx  x f f(E2  | X = 0}  dE2 
^ 2 

For  a given  <y  , E^  is  found  from  the  tables  of  the  Incomplete  Beta 
Function.  The  power  of  the  test  is  given  by 

P * 1 - p a f f(E2)  dE2 

e»t  y 

and  can  be  calculated  numerically,  f(E*)  being  now  given  by  (6.  12), 
with  \ 0.  Tang's  tables  give,  for  different  numbers  of  degrees 

of  freedom,  the  corresponding  values  of  E^  for  a 0.  05  and  0.01, 
and  also  the  power  P for  selected  values  of  p . Emma  Lehmer  (1944) 
has  published  inverse  tables  which  for  selected  values  of  P give  the 
corresponding  values  of  p , both  for  « =0.05  and  for  « = 0.  01. 


(6.  12) 


(6.  13) 


(6.  14) 


(6.  15) 


In  the  special  case  when  b = 2 (and  therefore  n =1),  there 
is  only  one  set  of  0(  ^ which  will  yield  a given  p and  at  tlie  same  time 
satisfy  the  relation  0.  If  the  true  treatment  means  for  the  first 

and  second  treatments  are  f*-0  = f*  + 0/ , , and  + «jl  , we 

clearly  have  - o(,#  = = 1/2  -/*«  ),  so  that  = _!.  ( /<<“/4o)2  411(1 

f - - The  degrees  of  freedom  n^  and  are  1 and  2(r  - 1) 

respectively. 

74 
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The  quantity  P of  Neyman  and  Tokarska's  tables  (see  £ 4.  5)  is  here 

£ = » since  N = N * r,  so  that  f * ^2  f.  How- 

ever, Tang's  tables  are  for  the  two-totiled  test  and  Neyman  and 
Tokarska's  for  the  one-tailed  test,  so  that  Tang's  (or  Lehmer'sj 
level  <yl  = 0.  05  corresponds  to  Neyman's  level  <y  = 0.025. 

6.  3 An  Example 

This  example  is  given  by  Tang.  Suppose  we  have  four 
treatments,  each  with  five  replications,  in  a randomized  block 
experiment.  Then  n^  = 3 and  n =16-4=12  (four  degrees  of  free- 
dom are  allowed  between  block^).  Let  the  treatment  differences 
(expressed  as  percentages  of  the  mean  yield)  be  -5,  -4,  3,  6,  and 
let  the  standard  deviation  (estimated  from  paiexperience)  be  10% 
of  the  mean.  Then 

<pl-  ( F/rlJ 

= _5  . _8_6  = 1.075,  so  that  p = 1.04. 

4 100 

Tang's  tables  for  the  5%  significance  level  show  that  when  0=1, 

P = 0.  269  and  when  jU  = 1.5,  P = 0.556.  This  suggests  that  P is 
about  0.3,  so  that  if  the  true  treatment  differences  were  as  indicated 
above,  the  chance  of  detecting  them  at  the  5%  level  would  be  only 
about  3 out  of  10. 

A similar  result  holds  for  a Latin  square  experiment.  In 
a square  of  size  n x n,  the  degrees  of  freedom  are  n = n - 1, 
n2  = (n  - l)(n  - 2). 

6.  4 The  F-test,  on  the  Assumption  that  the  Treatment  Effects 

are  not  Constant  but  are  Normally  Distributed 

We  can  in  some  cases  suppose  that  the  b lot  means  represent 
a sample  from  a normally  distributed  super -population  of  means  with 
a standard  deviation  0 <J*  , 0 being  a pure  number.  The  sampling 
variance  for  a single  lot  mean  of  size  r is  CJ^r,  so  that  the  total 
variance  among  the  means  is 
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a%  + 02  * r * ^ , 

t 2 

where  = 1 + r 0 . The  null  hypothesis  Hq  is  that  0 = 0,  and  the 
alternative  hypothesis  is  that  0 > 0. 

On  hypothesis  H^,  k(r  “ 1)  has  the  F distribution 

(b  - l)q2 

with  b - 1 and  b(r  - 1}  degrees  of  freedom.  On  H , the  quantity 
b(r  - l)qx 

■ has  the  F distribution  with  the  same  degrees  of  freedom. 

\>  - l)q2 

The  hypothesis  Hq  will  be  rejected  when  F ">  F<*  , F here 
standing  for  £b(r  ~ l)q,  "]  / £*  (b  - ljq^  * ^ Hj  is  true,  F "> 

implies  that  F/^1  > ^ / >.*  , and  the  probability  of  this  is  the  power 

of  the  test.  Therefore,  since  F/'X*'  has  the  F distribution, 

oo 

P * f f(F)dF 

G./V 

and  this  can  be  found  from  tables  of  the  F function. 

Table  VIII  has  been  calculated  from  Merrington  & Thompson's 
tables  (1943)  of  the  F distribution.  It  gives  for  <X  = . 05  and  P =0.5, 
the  value  of  0 corresponding  to  selected  values  of  b and  r.  That  is  to 
say,  it  gives  the  standard  deviation  of  lot  means,  as  a fraction  of  the 
standard  deviation  of  the  population,  which  has  an  even  chance  of 
being  detected  at  the  5%  significance  level,  as  a result  of  an  analysis 
of  variance  based  on  b lots  with  r replicates  in  each  lot.  Thus,  with 
3 lots,  one  would  need  at  least  5 items  in  each  sample  in  order  to 
stand  an  even  chance  of  finding  a significant  component  of  variance 
between  lot  means,  if  actually  this  component  were  as  large  as  the 
population  variance  (0  = 1). 


(6.  16) 


.17) 
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TABLE  VIII 


Ratio  of  Standard  Deviation  of  Lot  Means  to  Standard  Deviation  of 
Population,  detectable  with  Power  0.5  at  Significance  Level  0.05, 
for  b Treatments  (or  Lots)  with  r Replicates. 


10 


2 

3.66 

2.22 

1.73 

1.48 

3 

2.09 

1.37 

1.  11 

0.98 

4 

1.63 

1.08 

0.89 

0.79 

5 

1.39 

0.93 

0.77 

0.65 

6 

1.23 

0.82 

0.68 

0.61 

7 

1.  12 

0.75 

0.62 

0.55 

8 

1.04 

0.69 

0.58 

0.51 

9 

i 

0.97 

0.65 

0.54 

0.48 

10 

0.91 

0.61 

0.51 

0.45 

11 

0.86 

0.58 

0.48 

0.43 

12 

0.82 

0.55 

0.46 

0.41 

16 

0.  70 

0.47 

0.39 

0.  35 

21 

0.61 

0.41 

0.  34 

0.  30 

j“ 

0.56 

0.  37 

0.  31 

0.28 

| 31 

0.  50 

0.33 

0.28 

0.25 

i 61 

0.35 

0.24 

0.20 

0.  17 

1.32 

1.21 

1.  13 

1.06 

1.01 

0.89 

0.83 

0.78 

0.74 

0.71 

0.72 

0.65 

0.63 

0.60 

0.58 

0.62 

0.  58 

0.55 

0.52 

0.50 

0.55 

0.  52 

0.49 

0.47 

0.45 

0.51 

0.47 

0.45 

0.43 

0.41 

0.47 

0.44 

0.41 

0.40 

0.38 

0.44 

0.41 

0.39 

0.37 

0.  36 

0.41 

0.39 

0.37 

0.  35 

0.34 

0.39 

0.  37 

0.35 

0.33 

0.  32 

0.  37 

0.  35 

0.  33 

0.  32 

0.30 

0.32 

0.30 

0.28 

0.27 

0.26 

0.28 

0.26 

0.25 

0.24 

0.23 

0.25 

0.  24 

0.22 

0.21 

0.21 

0.23 

0.21 

0.  20 

0.  19 

0.  19  ; 

0.  16 

0.  15 

0.  14 

0.  14 

0.  13  1 
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VII.  DISTRIBUTION-FREE  TESTS 


7. 1 Van  der  Waerden's  Test  for  the  Difference  of  Means  of  two 

Samples 

The  chief  objection  to  Student's  test  for  the  difference  of 
means  is  the  necessity  for  assuming  normality  in  the  distributions. 

Various  tests  have  been  derived  which  do  not  require  this  assumpt- 
ion, but  such  tests  rre  usually  considerably  less  powerful  than 
Student's  test,  particularly  when  the  samples  are  fairly  large. 

Van  der  Waerden  (1935)  has  described  a test  which  does  not  require 
the  assumption  of  normality  but  which,  if  the  distributions  are 
normal,  is  asymptotically  as  powerful  as  Student's  test. 

Suppose  there  are  m observations  x^,  . . . , x and  n 

observations  y^,  y , ...»  y . Let  the  means  of  these  two  samples 
be  x and  y respectively  and  ^et  D = x - y.  The  null  hypothesis  to 
be  tested  is  that  D = 0 and  the  alternative  hypothesis  is  that  D > 0. 

The  method  consists  in  placing  all  the  observations  in 
order  of  increasing  size  (the  x's  and  y's  mixed  up  together), 
labelling  them  z , z_  ...  z (N  = m + n),  and  associating  with  each 
z^  (k  = 1,  2,  ...»  N]  a standardized  normal  variate  , defined 

by  $ (Ik)  — k/(N  4-  1).  This  means  that  the  ^ are  normal 
deviates  corresponding  to  values  of  the  cumulative  distribution 
function  which  are  evenly  spaced  between  0 and  1.  Thus  if  m = n = 5, 
there  are  10  values  of  ^ , given  by  =1/11,  = 2/11..., 

5 (£to)  =10/11.  If  we  denote  the  inverse  function  by  , we  can 
write 

(7.  1) 

(7.2) 


= Y [w(n  ♦ d] 


where 


k 

v-'/J- 


¥i*)  -%x/i 


^ x 


<=o 


tn. 


We  now  pick  out  and  total  those  values  of  which  are 
associated  with  the  x's, and  which  we  may  denote  by  f , t , ..  . 

. If  the  x's  on  the  whole  are  larger  than  the  y's,  the  total  2 f- 
will  be  greater  than  zero.  If  this  total  exceeds  a certain  critical 
value,  depending  on  m and  n,  the  difference  D between  x and  y may 
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be  regarded  as  significantly  different  from  zero.  A table  of  critical 
values  corresponding  to  the  5%  significance  level  is  given  in  Table 


IX. 


Thus,  suppose  the  following  sample  values  are  recorded: 


X 

26 

29 

28 

24 

22 

y 

23 

25 

21 

18 

20  27 

Here  m = 5,  n = 6,  D = x - "y  = 3.47.  The  values  are  placed  in  order 
in  the  following  table,  with  the  x's  ringed.  The  x.  are  obtained 
conveniently  from  Kelley's  Statistical  Tables  (194-8). 


z 1 18 

20  21  © 23 

© 

25  (2$) 

27  © 

k 1 1 

2 3 4 5 

6 

7 8 

9 10  11 

-0.4307 

0 

0.4307 

0.9674  1.3830 

The  sum  of  the  Fjis  2.3504,  and  the  5%  critical  value  is  2.28.  The 
difference  between  x and  y is  therefore  significant  at  the  5%  level. 

This  test  depends  only  on  the  order  of  the  observed  values. 
There  are  actually  462  ways  in  which  5 x's  and  6 y's  can  be  permuted, 
counting  as  different  only  those  ways  in  which  the  x's  occupy 
different  relative  positions  in  the  sequence.  Of  these  ways,  exactly 
23  have  2,  ^.2.  317  and  24  have  2"^-  Z-  2.278.  If  all  arrangements 

are  equally  probable,  the  chance  of  wrongly  rejecting  the  null  hypo- 
thesis is  23/462,  which  is  very  close  to  0.05. 

7.  2 Calculation  of  Critical  Values 

For  large  values  of  N,  the  distribution  of  X ^ approximates 
to  normal  with  a mean  of  zero.  The  true  variance  is  m n Q/£l  » 1), 
where 
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(7.3) 


TABLE  IX 

Critical  Value*  of  X,  fjfor  Van  der  Waerden's  Test  ( a * 0.05 


1.47  1.54 
1.54  1.70 


1 

1.43 

1.02 

1.00 

9 

1.40 

1.91 

2.01 

10 

1.73 

1.90 

2. 12 

2. 14 

11 

1.77 

2.04 

2.21 

2.28 

1a 

1.01 

2.10 

2.29 

2.40 

2.43 

13 

1.03 

2. 14 

2.34 

2.47 

2.53 

14 

1.34 

2.  13 

2.40 

2.54 

2.43 

2.44 

15 

1.34 

2.21 

2.45 

2.61 

2.71 

2.74 

20 

1.97 

2.34 

2.42 

2.84 

3.00 

3. 13 

3.21 

3.24 

3.  28 

25 

2.02 

2.42 

2.73 

2.90 

3.18 

3.35 

3.48 

3.58 

3.45 

30 

2.04 

2.40 

2.01 

3.08 

3.31 

3^  50 

44 

3.79 

3.  90 

35 

2.09 

2.52 

2.07 

3.15 

3.40 

3.  40 

3.78 

3.94 

4.  07 

40 

2.11 

2.54 

2.91 

3.21 

3.44 

3. 49 

3.88 

4.05 

4.  20 

The  size  of  the  x sample  is  m and  that  of  the  y sample  n (N  * m + n) 
The  x sample  is  that  with  the  greater  mean.  If  m > n,  read  the 
above  table  for  n instead  of  m. 
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For  small  values  of  N this  can  easily  be  evaluated  and  for  large  N 
the  approximation 

Q « 1 - (2  in  N)/N  + (Ciia.  N)/N  (7.  4) 

(where  in  is  the  Napierian  logarithm)  is  remarkably  good,  as 
indicated  in  the  following  brief  table: 


_N 

Q(exact) 

Q (approx. ) 

5 

0.4486 

0.4513 

10 

0.6216 

0.6229 

15 

0.7045 

0.7053 

20 

0.7546 

0.7553 

For  the  normal  approximation,  the  critical  value  of  2T  5j.'is  taken  as 
_ l’/a 

¥ ( 1 ~ of  } J~mnQ/{N-l)J  , where  o L is  the  probability  of 

error  of  the  first  kind.  In  Table  IX,  of  =0.05  and  (1  -<V  ) = 1.6449. 

The  critical  values  in  this  table  are  calculated  for  the  normal  approx- 
imation. 


For  small  values  of  N,  the  probability  of  error  of  the  first 
kind,  with  these  critical  values,  will  not  be  exactly  0.05.  However, 
the  following  table  indicates  that  the  differences  from  0.  05  are  not 
serious  as  long  as  neither  m nor  n is  very  small. 


Si 
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Probabilities  of  Error  of  the  First  Kind  in  using  the  Critical  Values  of 
Table  IX. 


£ 

2 

3 

4 

5 

6 

. 067 

.050 

7 

.048 

.057 

8 

.071 

.054 

.057 

9 

.055 

.048 

.048 

10 

.044 

.050 

.052 

.048 

11 

.055 

.055 

.051 

.050 

7.3  The  Power  of  the  Van  der  Waerden  Test 


Since  the  test  depends  only  on  the  order  of  the  observations, 
the  actual  distribution  function  for  the  x'b  or  the  y's  is  irrelevant.  If 
we  suppose  that  the  x's  are  normally  distributed  with  mean  a and 
variance  1 and  the  y' s are  also  normally  distributed  with  mean  0 and 
variance  1,  then  it  can  be  shown  that  2 has  asymptotically  (for 
fixed  m and  for  N->  ) the  same  distribution  as^X^,  and  its 

standard  deviation  is 

Asymptotically,  Student's  test  consists  in  rejecting  the 
null  hypothesis  (namely,  that  a = 0)  when  m^^  x > ^“(l  - « )^i.  e. 

when 

x^  > m^/^  (l-o < ).  The  Van  der  Waerden  test 

rejects  H0  when  2 > G"  where  <F  = m (N  - m)  Q/(N  - 1), 

and  since  the  distribution  of  approaches  that  of  x^  and  O’* 

approaches  m as  N-*  oO  , the  two  tests  are  evidently  asymptotically 
equivalent. 
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7.4 


Treatment  of  Ties 


It  may  happen  in  practice  that  there  are  some  ties  in  the 
ranking,  because,  even  though  the  variates  are  continuous,  the 
measurements  are  rounded  off.  These  ties  may  be  treated  in  different 
ways: 


(1) 


(2) 


if  q values  of  z^  (k  = 1,  2,  . . . , N)  are  equal  to  z,  and  p of 
them  are  less  than  z.,  we  assign  at  random  the  ranks  p + 1, 
p+2,  ...,  p + qto  tne  q equal  values,  and  therefore  the 
corresponding  £ are  taken  as  /P+i 


we  assign  the  ranks  among  the  tied  variates  in  all  possible 
ways.  If  there  are  s of  these  ways  and  for  r of  them  the  value  of 
T ^ belongs  to  the  critical  region,  we  reject  H with 
probability  r/sj 


(3)  each  set  of  q tied  values  may  be  alloted  a^-  which  is  the 
arithmetic  mean  of  the  £ + , £ * . This 

necessitates  a reduction  in  the  sum  of  squafres%f  the 
used  in  calculating  Q,  so  that  the  critical  value  needs  some 
adjustment.  If,  for  example,  the  items  ranked  8 and  9 
are  judged  equal  in  a total  of  11,  they  may  be  given  the 
value  1/2(0.431  + 0.674)  = 0.553.  The  reduction  in  the 
sum  of  squares  is  1/2(0.674  - 0.431)^  = 0.0297. 


7. 5 Terry's  Test 

M.  E,  Terry  (1952)  suggested  independently  a test  which 
is  very  similar  to  Van  der  Waerden's.  As  applied  to  the  two-sample 
problem,  the  null  hypothesis  Hq  is  that  the  two  samples  (of  m and  n 
observations)  come  from  the  same  continuous  population,  and  the 
test  is  most  powerful  against  the  alternative  hypothesis  H that  they 
come  from  two  normal  populations  with  means  M* , and  and 
common  variance  <T  , the  ratio  ( yu.  ( *)  / CP  being  sufficiently  small. 

The  test  consists  in  computing  a statistic  c,  which  is 
the  sum  of  the  expected  values  of  those  m items  (in  a sample  of  m + n 
drawn  from  a standard  normal  population)  which  have  ranks  the  same 
as  thoseof  the  x^  in  the  observed  combined  sample,  when  the  x's  and 
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y's  are  placed  in  order.  Thus,  in  the  illustration  used  in  § 7.  1, 
the  x’s  occupy  ranks  4,  6,  8,  10  and  11.  From  Table  XX  o£  Fisher 
and  Yates’s  Statistical  Tables,  the  expected  values  for  these  ranks 
in  a sample  of  11  from  a standard  normal  population  would  be  -0.  46, 

0,  0.46,  1.06,  and  1.59  respectively,  and  the  statistic  c is  therefore 
2.65.  A 5%  critical  value  for  c is  determined  by  computing  c for  the 
23  or  24  permutations  out  of  462  which  give  the  largest  values.  It 
is  found  that  23  permutations  have  c i 2.  54  and  24  have  c ^ 2.47. 

The  5%  critical  value  is  therefore  about  2.  54,  and  accordingly  the 
difference  between  the  samples  is  significant  at  the  5%  level. 

Terry  has  shown  that  his  statistic  c has  on  the  null 
hypothesis  a variance  V(c)  = j f N = m + n,  where 

is  the  expected  value  of  the  item  of  rank  i in  a sample  of  N drawn 
from  a standard  normal  population.  Values  of  T/«»i*are  given  in 
Fisher  and  Yates's  Table  XXI.  He  has  also  shown  that  if  0 n/N  < 1 
as  N , then  the  distribution  of  c approaches  normal  with  mean 
0 and  variance  V(c}. 

The  distribution  of  c{N  -2 )/  C V(c)  {N  - 1}  - c2  ] is 
approximately  that  of  Student's  t,  with  N - 2 degrees  of  freedom. 

Thus,  for  the  case  given  above,  with  m = 5,  n = 6,  and  N * 11,  the 
value  of  05  for  9 d.f.  is  1.833  (the  one-tailed  test),  and  = 

8.8892.  The  critical  value  of  cj,  determined  by 

-O-  (7.5) 

is  2,57,  which  agrees  very  well  with  the  correct  value  2.  54. 

The  power  of  this  test  has  not  been  determined  analytically,  but 
experimental  results  on  random  numbers  indicate  that  the  power 
is  not  far  below  that  of  Student's  t,  even  for  N as  small  as  8,  when 
(f*  i -f**~  ) / <7*  is  less  than  0.5  or  greater  than  2.  5.  For  inter- 
mediate values  there  is  a marked  reduction  in  power. 

Ties  are  treated  as  in  Van  der  Waerden-'s  test. 

7. 6 The  Mann  and  Whitney  ( or  Wilcoxon)  Test 

This  iB  a rank  order  test  of  the  hypothesis  H that  two 
sets  of  sample  values  x , . . . , xm  and  y^,  . . , , yn  come  from  the 
same  population,  against  the  alternative  hypothesis  that  the  x' s 

«4 
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are  stochastically  larger  than  the  y's.  If  the  random  variables  X and  Y 
have  continuous  cumulative  distribution  functions  F{x)  and  G(y),  X is 
stochastically  larger  than  Y when  for  every  a,  F(a}  < G{a),  i.  e.  the 
probability  that  X ^ a is  smaller  than  the  probability  that  Y ^ a. 

A statistic  T to  test  this  was  first  proposed  by  Wilcoxon 
(1945).  If  the  x's  and  y's  are  arranged  in  increasing  order,  T is  the 
sum  of  the  ranks  of  the  x's  in  this  sequence.  For  the  data  given  in 
§7.  1,  these  ranks  are  4,  6,  8,  10,  11,  so  that  T = 39.  An  equivalent 
statistic  U was  tabulated  by  Mann  and  Whitney  (1947)  and  is  the  number 
of  inversions,  that  is,  the  number  of  times  an  x precedes  ay.  In 
the  example,  U * 3 + 2 + 1 = 6,  since  the  value  x = 22  precedes  the 
three  values  y = 23,  25  and  27,  x = 24  precedes  y = 25  and  27,  and  so 
on.  In  general 

U = mn  + m(m  + l)/2  - T.  (7.6) 

If,  under  the  null  hypothesis,  Pr  (U  ^ U)  = °<  , the  test  which  consists 
in  rejecting  Hq  when  U ^ iT  has  a size  oi  . 

The  expectation  of  U on  the  null  hypothesis  is  nm/2  and 
its  variance  is  nm(n  + m + 1) / 12.  The  limiting  distribution  of  U is 
normal  as  both  m and  n tend  to  oo  , and  for  m = n = 8 the  distribution 
ofU  - 1/2  n mis  very  close  to  normal.  Mann  and  Whitney  have  calc- 
ulated tables,  for  m and  n not  greater  than  8,  giving  the  probabilities 
of  obtaining  different  possible  valueB  of  U.  Thus,  for  m = 5 and  n = 6 
(m  and  n can  be  interchanged  in  the  tables)  we  find  that  Pr(U  ^ 5)  is 
0.041  And  Pr(U  ^ 6)  is  0.  063.  If  we  reject  Hq  when  V £ 6 (and 
therefore  will  do  so  in  the  example  given  in  £ 7.  1)  the  probability  of 
error  of  the  first  kind  is  0.  063.  If  we  want  to  keep  this  error  below 
0.05  we  shall  have  to  take  U = 5,  and  the  hypothesis  H will,  in  this 
particular  example,  not  be  rejected.  Alternatively,  we  could  reject 
it  with  probability  0.41  (since  0.041  + 0.41  (0.063  - 0.041)  = 0.05), 
using  a table  of  random  numbers. 

The  maximum  values  of  U such  that  the  size  of  the  test 
is  not  greater  than  0.05  are  given  in  Table  XI.  For  larger  m and  n 
the  normal  approximation  may  be  used 
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U * 1 / 2(nm  - 1)  - 1.645  ^ nm(n  + m + l)j  2 (7.7) 

(The  term  -1/2  arises  because  of  the  discontinuity  of  U). 

For  n * m = 8,  this  give  U = 15.8.  The  actual  probability  is  0.  052  for 
a value  less  than  or  equal  to  16  and  0.  041  for  a value  less  than  or 
equal  to  15. 


TABLE  XI 

Critical  Values  U for  ^ 0.05,  for  given  sample  sizes  m 
and  n (m  ^ n},  in  the  Mann  and  Whitney  Test. 


The  Mann  and  Whitney  test  is  less  powerful  than  the 
Van  der  Waerden  or  the  Terry  test.  The  test  is  consistent,  in  the 
same  sense  that  as  m and  n tend  to  infinity  the  probability  of  re- 
jection of  the  null  hypothesis,  when  the  alternative  hypothesis  is  true, 
tends  to  1. 
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7.7 


The  Sign  Test 


This  is  an  approximate  test  of  the  difference  between  two 
sets  of  paired  observations.  The  usual  test,  assuming  normality  of 
the  distributions,  is  the  Student  t-test  for  the  mean  of  the  differences 
between  pairs,  the  null  hypothesis  being  that  this  mean  is  zero.  The 
sign  test  takes  account  merely  of  the  signs  of  these  differences.  The 
twtf  observations  belonging  to  one  pair  are  assumed  made  under 
conditions  as  alike  as  possible,  but  conditions  may  vary  widely  from 
one  pair  to  another,  and  this  circumstance  may  invalidate  the  Student 
t-test. 


It  is  assumed  that  the  + and  - signs  of  the  paired  deferences 
have,  on  the  null  hypothesis,  a binomial  distribution.  Zero  differences 
are  ignored.  If  there  are  N non-zero  differences,  with  x of  these 
positive  and  N - x negative,  the  hypothesis  Hq  is  that  in  independent 
sampling  x is  distributed  according  to  the  terms  of  the  binomial 
(1/2  + 1/2)^.  The  alternative  hypothesis  H is  that  the  distribution  of 
x is  according  to  (q  + p)^,  where  p 1/2.  * If  r is  the  smaller  of  x 
and  N - x the  test  consists  in  rejecting  Hq  when  r ^ r^  , rw  being 
a number  which  depends  on  N and  on  the  assumed  significance  level  o(  . 

A table  of  critical  values  for  the  application  of  the  test 
was  compiled  by  Dixon  and  Mood  (1946)  and  is  reproduced  in  Dixon 
and  Massey's  "Introduction  to  Statistical  Analysis"  (McGrawHill,  1951). 
A discussion  of  the  power  of  the  sign  test  was  given  by  W.  M.  Stewart 
(1941). 


The  following  questions  arise:  (a)  what  is  the  minimum 
value  of  N that  must  be  used  if  we  want  a given  power  for  testing 
against  some  assumed  alternative  H ? (b)  what  is  the  maximum 

value  that  r may  have  for  a given  N if  H0  is  to  be  rejected  at  signif- 
icance level  at  ? Table  XII  is  extracted  from  Stewart's  paper.  It 
shows,  for  example,  that  in  order  to  have  at  least  an  even  chance  of 
detecting,  at  the  5%  significance  level,  the  difference  between  p = 

0.70  and  p = 0.  50,  one  would  need  at  least  25  pairs,  and  the  lesser 
number  of  like  signs  must  not  be  more  than  7.  Suppose,  for  example, 
machined  parts  of  a specified  diameter  are  tested  by  a "go"  and  "no-go" 
gauge,  and  on  the  average  50%  will  "go".  A new  machine  produces 
parts  of  which  70%  go.  To  have  an  even  chance  of  finding  a signif- 
icant difference  at  the  5%  level,  one  would  need  a sample  of  at  least 
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25,  and,  to  be  significant,  18  or  more  should  "go". 


TABLE  XII 

Minimum  N and  Maximum  r for  testing  with  Power  P the  Hypothesis 
that  the  Proportion  of  Signs  in  Paired  Differences  is  0.  5,  the  True 
Value  of  p being  as  given. 


5^^ 

0. 

60 

0.65 

0.70 

.30 

54. 

20 

25, 

7 

18, 

4 

.50 

101, 

40 

44, 

15 

25, 

7 

.70 

158, 

66 

67, 

25 

40, 

13 

.95 

327, 

145 

143, 

59 

79, 

30 

0.75 

0.  80 

0.85 

0.90 

0.  95 

9,  1 

11,  1 

7,  0 

18,  4 

13,  2 

10,  1 

6,  0 

25,  7 

* 

00 

H 

12,  2 

10,  1 

6,  0 

49,  17 

35,  11 

23,  6 

17,  4 

12,  2 

In  each  pair  N is  the  first  number  and  r the  second. 

Far  p < 0.  50  use  1 - p.  If  x is  the  number  of  + signs  and  N - x the 
number  of  - signs,  r is  the  smaller  of  x and  N - x. 
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